オムライスの備忘録

数学・統計学・機械学習・プログラミングに関することを記す

【マルチモーダル】Image Caption

データサイエンスデータサイエンス-マルチモーダル

Index

Index
Image Caption
アルゴリズム
- BRNN
- CPTR / 2021
- Re-ViLM
AEC / Affective Explanation Captioning
- Affection / 2022
参考
- Web サイト

Image Caption

画像を入力とし、画像を説明するテキストを出力する.

マルチモーダル #まとめ編
- Vision-Language
- yhayato1320.hatenablog.com

アルゴリズム

BRNN

Deep Visual-Semantic Alignments for Generating Image Descriptions
- [2014]
- arxiv.org

CPTR / 2021

CPTR: Full Transformer Network for Image Captioning
- [2021]
- arxiv.org

Re-ViLM

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
- [2023]
- arxiv.org

AEC / Affective Explanation Captioning

実世界の画像に対して、そこから想起される感情と説明文を生成するタスク.

Affection / 2022

Affection: Learning Affective Explanations for Real-World Visual Data
- [2022]
- arxiv.org

参考

Web サイト

paperswithcode.com
- paper with code の task のページ