オムライスの備忘録

数学・統計学・機械学習・プログラミングに関することを記す

【マルチモーダル】Text-to-Speach / TTS

Speech Language #まとめ編
- yhayato1320.hatenablog.com

Index

Index
Text-to-Speach / TTS
テクニック・工夫
- pause insertion / 2023
アプリケーション・サービス
- Bark

Text-to-Speach / TTS

SPEAR-TTS / 2023

Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
- [2023]
- arxiv.org

Multilingual Shallow Fusion / 2023

Massively Multilingual Shallow Fusion with Large Language Models
- [2023]
- arxiv.org

Imaginary Voice / 2023

Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
- [2023]
- arxiv.org

Foundation TTS / 2023

FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
- [2023]
- arxiv.org

NaturalSpeech 2 / 2023

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
- [2023]
- arxiv.org
- speechresearch.github.io

テクニック・工夫

pause insertion / 2023

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
- [2023]
- arxiv.org
- sarulab-speech.github.io

アプリケーション・サービス

Bark

Bark
- github.com
- github.com