LAION-5B: An open large-scale dataset for training next generation image-text models
- [2022]
- arxiv.org
LAION、50億の画像-テキストペア・データセットLAION-5Bを公開
- www.infoq.com

LAION-115M / 2022

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- [2022]
- arxiv.org
LAION-115M
- crfm.stanford.edu

Outdoor Multimodal Dataset / OMMO Dataset / 2023

A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction
- [2023]
- arxiv.org

WHOOPS! / 2023

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
- [2023]
- arxiv.org
- whoops-benchmark.github.io

CelebV-Text / 2023

CelebV-Text: A Large-Scale Facial Text-Video Dataset
- [2023]
- arxiv.org
- celebv-text.github.io

Multimodal C4 / 2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved With Text
- [2023]
- arxiv.org
- github.com

DataComp / 2023

128 億の画像とテキストのペア、300 以上の実験、14 億のサブセットをリリース.

DataComp: In search of the next generation of multimodal datasets
- [2023]
- arxiv.org
- github.com
- www.datacomp.ai

MineDojo / 2022

Video / Text.

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
- [2022]
- arxiv.org
MineDojo
- crfm.stanford.edu

Speech Language

WavCaps / 2023

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
- [2023]
- arxiv.org

Benchmark

GEM / 2021

既存のベンチマークが主に自然言語タスクに焦点を当てているのに対し、GEMは画像と言語（GEM-I）および動画と言語（GEM-V）のタスクを対象としている. GEMの際立った特徴は、最大規模のビジョン・言語データセットであると同時に、複数の言語でラベル付けされている点. 研究者らは、このベンチマークのベースラインとして、M3Pとm-UniVLという2つの多言語マルチモーダル事前学習モデルを提供しており、この取り組みが多言語マルチモーダル研究の進展に貢献することを目的としている. GEM-Iには20言語の約120万組のデータが含まれ、GEM-Vには30言語の約9万9千組のデータが含まれており、これらは実際の商業検索エンジンから収集されたもの.

GEM: A General Evaluation Benchmark for Multimodal Tasks
- [2021]
- arxiv.org

GEM: A General Evaluation Benchmark for Multimodal Tasks
pdf: https://t.co/O79KCFRtlp

large-scale multilingual multimodal dataset, natural language contexts collected from search engine in 20 and 30 languages for image related and video-related tasks pic.twitter.com/SLWetXKBWD
— AK (@_akhaliq) June 21, 2021

オムライスの備忘録

数学・統計学・機械学習・プログラミングに関することを記す

【データセット】マルチモーダルデータ #まとめ編

Index

マルチモーダルデータ

Flickr30k / 2015

CLEVR / 2016

Conceptual Captions / 2018

WebImageText / 2021

LAION-5B / 2022