Index
Visual Question Answering / VQA
画像 (Visual) と 質問 (Question / Text) を入力として、解答 (Answer / Text) を出力するタスク.
- マルチモーダル #まとめ編
- Vision-Language
- yhayato1320.hatenablog.com
アルゴリズム
- Ask Your Neurons: A Neural-based Approach to Answering Questions about Images
- [2015]
- arxiv.org
Dual Attention Networks / DANs / 2016
- Dual Attention Networks / DANs
MMBT / 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
- [2019]
- arxiv.org
クロスモーダル事前学習不要のVQAモデル, Multimodal Bitransformer
MMBT(MultiModal BiTransformers)の逆伝播について(マルチモーダル深層学習)
Visual Reasoning
ALOE / attention over learned object embeddings / 2020
- Attention over learned object embeddings enables complex visual reasoning
- [2020]
- arxiv.org
参考
Medical Visual Question Answering: A Survey
- [2021]
- arxiv.org
A Survey on VQA: Datasets and Approaches
Visual Question Answering using Deep Learning: A Survey and Performance Analysis
- [2019]
- arxiv.org
Visual Question Answering: A Survey of Methods and Datasets
- [2016]
- arxiv.org
Web サイト
- NeurIPS 2021 参加報告 前編
- ALOE
- blog.recruit.co.jp