2023-01-24

【深層学習】Deformable Attention Transformer / DAT

データサイエンスデータサイエンス-深層学習

Transormer #まとめ編
- 画像への応用
- yhayato1320.hatenablog.com

Index

Index
Deformable Attention Transformer / DAT
参考
- Web サイト

Deformable Attention Transformer / DAT

参考

Vision Transformer with Deformable Attention
- [2022]
- arxiv.org

Web サイト

Swin Transformerを超える最先端画像認識モデルDeformable Attention Transformerを詳細解説！
- deepsquare.jp
Researchers from China Propose DAT: a Deformable Vision Transformer to Compute Self-Attention in a Data-Aware Fashion
- www.marktechpost.com

2023-01-24

【深層学習】MOTR / Multiple-Object Tracking with Transformer #実装編

データサイエンスデータサイエンス-深層学習

Index

Index
MOTR / Multiple-Object Tracking with Transformer
実装
- 処理の実行

MOTR / Multiple-Object Tracking with Transformer

Transformer を利用した Object Tracking.

MOTR
- yhayato1320.hatenablog.com

実装

github.com

処理の実行

2023-01-23

【動画像処理】Transformer #まとめ編

データサイエンスデータサイエンス-画像処理データサイエンス-時系列解析データサイエンス-深層学習

Index

Index
動画への応用
アルゴリズム
タスク
- Video Restoration
  - ReBotNet / 2023
参考

動画への応用

Transformer を動画へ応用した手法をまとめる.

Transformer #まとめ編
- yhayato1320.hatenablog.com
動画像処理 #まとめ編
- yhayato1320.hatenablog.com

アルゴリズム

VisTR / 2020

End-to-End Video Instance Segmentation with Transformers
- [2020]
- arxiv.org

ViViT / 2021

ViViT: A Video Vision Transformer

Memory-efficient Bidirectional Transformer / MeBT / 2023

Video 生成.

Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers
- [2023]
- arxiv.org
- sites.google.com

TimeSformer / 2021

Is Space-Time Attention All You Need for Video Understanding?
- [2021]
- arxiv.org
TimeSformer：3DCNNを超えて動画像を捉えるTransformer
- https://ai-scholar.tech/articles/image-recognition/Transformer

Video Taskformer / 2023

Learning and Verification of Task Structure in Instructional Videos
- [2023]
- arxiv.org
- medhini.github.io

Streaming Vision Transformer / S-ViT / 2023

Streaming Video Model
- [2023]
- arxiv.org

SVT / 2023

SVT: Supertoken Video Transformer for Efficient Video Understanding
- [2023]
- arxiv.org

Adaptive Matting / AdaM / 2023

Adaptive Matting for Dynamic Videos, termed AdaM
- [2023]
- arxiv.org

StepFormer / 2023

StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
- [2023]
- arxiv.org

タスク

Video Restoration

ReBotNet / 2023

ReBotNet: Fast Real-time Video Enhancement
- [2023]
- arxiv.org
- jeya-maria-jose.github.io

参考

コンピュータービジョン最前線 Spring 2022
- 1 イマドキノ動画認識
  - 1.2 代表的な認識モデル
    - 1.2.2 認識モデル
      - Trasformer による認識モデル
      - 動画認識における CNN vs Transformer
- コンピュータビジョン最前線 Spring 2022
  - 共立出版
  Amazon

2023-01-23

【深層学習】Deformable DETR

データサイエンスデータサイエンス-深層学習

Index

Index
Deformable DETR
Deformable Attention Module
- Multi-scale Deformable Attention Module
その他の工夫
- Iterative Bounding Box Refinement
参考
- Web サイト

Deformable DETR

DETR の改善手法.

DETR
- yhayato1320.hatenablog.com

Deformable Attention Module

Deformable Attention Module の提案.

Deformable CNN から着想.

Deformable CNN
- yhayato1320.hatenablog.com

Attention Module が、重点的に注意を払うポイントは、入力される Feature Map のサイズに関わらず、基準点の周辺となる.

Deformable Attention Transformer との関連は？

Deformable Attention Transformer / DAT

yhayato1320.hatenablog.com

Multi-scale Deformable Attention Module

その他の工夫

Raft: Recurrent all-pairs field transforms for optical flow.
- [2020]
- arxiv.org

参考

Deformable DETR: Deformable Transformers for End-to-End Object Detection
- [2020 SenseTime Research]
- v4
- 2 RELATED WORK
  - Efficient Attention Mechanism
  - Multi-scale Feature Representation for Object Detection
- 3 REVISITING TRANSFORMERS AND DETR
  - Multi-Head Attention in Transformers
  - DETR
- 4 METHOD
  - 4.1 DEFORMABLE TRANSFORMERS FOR END-TO-END OBJECT DETECTION
    - Deformable Attention Module
    - Multi-scale Deformable Attention Module
    - Deformable Transformer Encoder
    - Deformable Transformer Decoder
  - 4.2 ADDITIONAL IMPROVEMENTS AND VARIANTS FOR DEFORMABLE DETR
    - Iterative Bounding Box Refinement
    - Two-Stage Deformable DETR
- arxiv.org

Web サイト

Swin Transformerを超える最先端画像認識モデルDeformable Attention Transformerを詳細解説！
- deepsquare.jp
[DL輪読会]Vision Transformer with Deformable Attention （Deformable Attention Transformer：DAT）
- [DL輪読会]Vision Transformer with Deformable Attention （Deformable Attention Transformer：DAT） from Deep Learning JP
  www.slideshare.net

2023-01-20

【マルチモーダル】生成モデル / Generative Mode #まとめ編

データサイエンスデータサイエンス-マルチモーダル

Index

Index
生成モデル
参考

生成モデル

Vision Language における生成モデルをまとめる.

Vision Language
- yhayato1320.hatenablog.com
生成モデル
- yhayato1320.hatenablog.com

参考

Google Research, 2022 & Beyond: Language, Vision and Generative Models
- ai.googleblog.com

2023-01-20

【画像処理】Face Detection

データサイエンスデータサイエンス-画像処理

画像処理 #まとめ編
- タスク
- yhayato1320.hatenablog.com

Index

Index
Face Detection
アルゴリズム
- ArcFace / 2018
- Sub-center ArcFace / 2020
Face Recognition / 顔認識
- AttenFace / 2022
- FaceMAE / 2022
データセット
- F2LA
参考

Face Detection

画像中から顔を検出する

アルゴリズム

ArcFace / 2018

ArcFace
- yhayato1320.hatenablog.com

Sub-center ArcFace / 2020

Sub-center ArcFace
- yhayato1320.hatenablog.com

Face Recognition / 顔認識

AttenFace / 2022

AttenFace: A Real Time Attendance System using Face Recognition
- [2022]
- arxiv.org
顔認識を利用したリアルタイムの出席確認システム「AttenFace」
- https://ai-scholar.tech/articles/face-recognition/attenfaceai-scholar.tech

FaceMAE / 2022

顔認識技術におけるプライバシー保護と認識性能の両立を目指す新しいフレームワーク FaceMAE. この技術は、マスク化されたオートエンコーダーを用いて、プライバシーを保護しつつ、顔認識のトレーニングに適した合成データセットを生成することに成功.

FaceMAE: Privacy-Preserving Face Recognition via Masked Autoencoders
- [2022]
- arxiv.org
Researchers Propose a Novel Framework ‘FaceMAE’, Where the Face Privacy and Recognition Performance are Considered Simultaneously
- www.marktechpost.com

データセット

F2LA

Are Face Detection Models Biased?
- [2022]
- arxiv.org
顔検出モデルにおける顔の位置特定にバイアスは存在するのか？
- https://ai-scholar.tech/articles/face-recognition/facedetection-models-biasedai-scholar.tech

参考

https://paperswithcode.com/task/face-detectionpaperswithcode.com
- paper with code の task のページ
https://paperswithcode.com/task/face-recognition
- paper with code の task のページ

2023-01-18

【深層学習】3D CNN #まとめ編

データサイエンスデータサイエンス-深層学習

Index

Index
3D CNN
基本手法
応用手法
- C3D / 2014
- I3D / 2017
参考

3D CNN

CNN #まとめ編
- yhayato1320.hatenablog.com

基本手法

3D Convolutional Neural Networks for Human Action Recognition
- [2013]
- https://www.dbs.ifi.lmu.de/~yu_k/icml2010_3dcnn.pdf

応用手法

C3D / 2014

Learning Spatiotemporal Features with 3D Convolutional Networks
- [2014]
- arxiv.org

I3D / 2017

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
- [2017]
- arxiv.org

参考

3D CNN まとめ
- github.com

Index

Deformable Attention Transformer / DAT

参考

Web サイト

Index

MOTR / Multiple-Object Tracking with Transformer

実装

処理の実行

Index

動画への応用

VisTR / 2020

ViViT / 2021

Memory-efficient Bidirectional Transformer / MeBT / 2023

TimeSformer / 2021

Video Taskformer / 2023

Streaming Vision Transformer / S-ViT / 2023

SVT / 2023

Adaptive Matting / AdaM / 2023

StepFormer / 2023

タスク

Video Restoration

ReBotNet / 2023

参考

Index

Deformable DETR

Deformable Attention Module

Multi-scale Deformable Attention Module

その他の工夫

Iterative Bounding Box Refinement

参考

Web サイト

Index

生成モデル

参考

Index

Face Detection

ArcFace / 2018

Sub-center ArcFace / 2020

Face Recognition / 顔認識

AttenFace / 2022

FaceMAE / 2022

データセット

F2LA

参考

Index

3D CNN

基本手法

応用手法

C3D / 2014

I3D / 2017

参考