オムライスの備忘録

数学・統計学・機械学習・プログラミングに関することを記す

【マルチモーダル】Text to Image #まとめ編 #00

データサイエンスデータサイエンス-マルチモーダル

Index

Index
Text to Image
アルゴリズム
テクニック・工夫
タスク
- Image Editing
アプリケーション
- Midjourney
参考
- 書籍
- Web サイト

Text to Image

Text から Image を生成する Vision-Language のマルチモーダルなタスク.

マルチモーダル #まとめ編
- Vision-Language
- yhayato1320.hatenablog.com

アルゴリズム

Deep Recurrent Attention Writer / DRAW / 2015

Generating Images from Captions with Attention
- [2015]
- arxiv.org

OSCAR / 2020

OSCAR
- [2020 Microsoft / University of Washington]
- yhayato1320.hatenablog.com

Dream Fields / 2021

Zero-Shot Text-Guided Object Generation with Dream Fields
- [2021]
- Text to 3D
- arxiv.org
- www.itmedia.co.jp

Style CLIP / 2021

Style CLIP
- [2021]
- Style(Text) + Image -> Image
- Style GAN + CLIP
- yhayato1320.hatenablog.com

DALL-E / 2021

DALL-E #まとめ編
- yhayato1320.hatenablog.com

GLIDE / 2021

GLIDE
- yhayato1320.hatenablog.com

CLIPDraw / 2021

CLIPDraw
- yhayato1320.hatenablog.com

Imagen / 2022

Imagen
- yhayato1320.hatenablog.com

Parti / 2022

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
- [2022]
- arxiv.org
- parti.research.google

Make-A-Scene / 2022

Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
- [2022]
- arxiv.org

Textual Inversion / 2022

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
- [2022]
- arxiv.org

eDiff-I / 2022

eDiff-I
- yhayato1320.hatenablog.com

ANNA / 2023

ANNA: Abstractive Text-to-Image Synthesis with Filtered News Captions
- arxiv.org

GLIGEN / 2023

GLIGEN: Open-Set Grounded Text-to-Image Generation
- [2023]
- arxiv.org
- github.com
- huggingface.co

Attend-and-Excite / 2023

Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
- [2023]
- arxiv.org
- attendandexcite.github.io
github.com
- github

Encoder for Tuning / E4T / 2023

Designing an Encoder for Fast Personalization of Text-to-Image Models
- [2023]
- arxiv.org
- tuning-encoder.github.io
Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models
- 論文タイトル変更
E4T-diffusion
- github.com

CoBIT / 2023

CoBIT: A Contrastive Bi-directional Image-Text Generation Model
- [2023]
- arxiv.org

GlyphDraw / 2023

漢字などの複雑な象形に挑戦.

GlyphDraw: Learning to Draw Chinese Characters in Image Synthesis Models Coherently
- [2023]
- arxiv.org

テクニック・工夫

GAN

GAN-INT-CLS / 2016

GAN-INT-CLS
- yhayato1320.hatenablog.com

Stack GAN / 2016

Stack GAN
- yhayato1320.hatenablog.com

Attn GAN / Attentional GAN / 2017

Attn GAN / Attentional GAN
- yhayato1320.hatenablog.com

Giga GAN / 2023

Scaling up GANs for Text-to-Image Synthesis
- [2023]
- arxiv.org

Diffusion Model

Diffusion Model
- yhayato1320.hatenablog.com

ImageReward / 2023

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
- [2023]
- arxiv.org
- github.com

タスク

Image Editing

Image Editing
- yhayato1320.hatenablog.com

アプリケーション

Midjourney

テキストの説明文から画像を作成する独自の人工知能プログラムであり、また同プログラムを開発している独立した研究所の名称.

midjourney.com
- サービスページ

参考

書籍

Software Design 2023年2月号
- ［短期連載］画像生成AIのしくみ / AIに言葉を理解させる技術
  - 画像生成 AI の衝撃
  - テキスト入力に基づく画像生成
  - パーツ 1 : テキストエンコーダ
- Software Design (ソフトウェアデザイン) 2023年2月号 [雑誌]
  - 技術評論社
  Amazon

Web サイト

paperswithcode.com
- paper with code のタスクのページ
最近、人工知能による自然言語処理が爆発的に進化しているのでまとめてみた。【後編】
- 4 テキストからの画像生成
  - 4.2 GANによる画像生成
    - 4.2.1 GAN-INT-CLS（2016年）
    - 4.2.2 StackGAN（2017年）
    - 4.2.3 AttnGAN（2017年）
  - 4.3 OpenAIのDALL-EからDALL-E 2まで
    - 4.3.1 DALL-E（2021年1月）
    - 4.3.2 CLIP（2021年1月）
    - 4.3.3 GLIDE（2021年12月）
    - 4.3.4 DALL-E 2（2022年4月）
  - 4.4 GoogleのImagenとParti
    - 4.3.1 Imagen（2022年5月）
    - 4.3.2 Parti（2022年6月）
- note.com
最近気になってるText-to-Imageを応用したNeRF論文の解説
- speakerdeck.com