Index

Index
RTMDet
Architecture
工夫
- Large Kernel Depth Wise Convolution
- Dynamic Label Assignments
参考
- Web サイト

RTMDet

Object Detection
- CNN を利用した手法
- yhayato1320.hatenablog.com

RTMDet : Real-Time Models for object Detection,

Architecture

one-stage
- Backbone, Neck, Head で構成
Backbone
- 従来のYOLO (v4 / X) は、CSP DarkNet を搭載している
- CSP-block with large kernel depth-wise convolution layersに変更
Head
- Rotated Objects / Instance Segmentation に対応
YOLO v4
- yhayato1320.hatenablog.com
YOLO X
- yhayato1320.hatenablog.com
CSP Darknet
- yhayato1320.hatenablog.com

工夫

better representation with large-kernel depth-wise convolutions
better optimization with soft labels in the dynamic label assignments

Large Kernel Depth Wise Convolution

depth-wise に大きくする手法に注目.

Block について

DarkNet の Block を元に改善.

DarkNet
- yhayato1320.hatenablog.com

3.a に Depth-Wise Convolution (+ Point-Wise) を追加し、3.b のようにした.
通常のMobileNet 化だったら、3x3 の Depth-Wise だが、5x5 になっている. (Large Kernel Depth Wise Convolution)
Depth Wise Convolution
- yhayato1320.hatenablog.com

ここの部分を re-parametrized convolution にする手法もある.

YOLO v6
- yhayato1320.hatenablog.com

Balance of backbone and neck

Backbone から neck への情報の拡張により、全対的なバランスをとる
GiraffeDet / EfficientDet などより、neck の長さを短くした
EfficientDet
- yhayato1320.hatenablog.com
GiraffDet
- yhayato1320.hatenablog.com

Head

Shared detection head

separate detection heads
複数のスケール間で、head のパラメータを共有するが、異なる BN Layer を使う

Instance segmentation

CondInst と同じような構成をしている.

CondInst
- yhayato1320.hatenablog.com
Dynamic Convolution
- yhayato1320.hatenablog.com

kernel prediction head と mask feature head で構成されている.

mask feature head は、multi-level features から 8 チャネルの mask feature を抽出する 4 つの畳み込み層で構成されている.

kernel prediction head は、インスタンスごとに 169 次元のベクトルを予測する.

これらは 3 つの dynamic convolution kernel に分解され、 mask feature と coordinate feature を連結した情報を入力として、instance segmentation mask を生成する.

mask feature head は、4 つの convolution layer を持ち、neck から抽出された multi-level features を入力として、8 チャネルの mask features を予測する.

2 つの coordinate features を mask features と連結し、instance mask が生成される.

kernel head は、予測した各インスタンスごとに 169 次元ベクトルを予測する.

ベクトルは 3 つの部分に分割され、3 つの dynamic convolution kernel を生成するために使用される.

CondInst
- yhayato1320.hatenablog.com
SOLOv2: Dynamic and Fast Instance Segmentation
- [2020]
- arxiv.org
K-Net: Towards unified image segmentation
- [2021]
- arxiv.org

Rotated object detection / RTMDet-R

角度を推定するための regression branch を 1x1 Conv layer として追加する
bbox の領域を変換
GT との loss (GIoU Loss with Rotated IoU loss) を算出

Dynamic Label Assignments

参考

RTMDet: An Empirical Study of Designing Real-Time Object Detectors
- [2022]
- Abstract
- 2 Related Work
  - Efficient neural architecture for object detection
  - Label assignment for object detection
  - Instance segmentation
  - Rotated object detection
- 3 Methodology
  - 3.1 Macro Architecture
  - 3.2 Model Architecture
    - Basic building block
    - Balance of model width and depth
    - Balance of backbone and neck
    - Shared detection head
  - 3.3 Training Strategy
  - 3.4. Extending to other tasks
    - Instance segmentation
    - Rotated object detection
- arxiv.org

Web サイト

2024-09-05 機械学習勉強会
- jobs.layerx.co.jp

オムライスの備忘録

数学・統計学・機械学習・プログラミングに関することを記す

【深層学習】RTMDet

Index

RTMDet

Architecture

工夫

Large Kernel Depth Wise Convolution

Block について

Balance of backbone and neck

Head

Shared detection head

Instance segmentation

Rotated object detection / RTMDet-R

Dynamic Label Assignments

参考

Web サイト

Index

RTMDet

Architecture

工夫

Large Kernel Depth Wise Convolution

Block について

Balance of backbone and neck

Head

Shared detection head

Instance segmentation

Related Work

Rotated object detection / RTMDet-R

Dynamic Label Assignments

参考

Web サイト