Index
音声解析におけるアルゴリズム
音声解析における深層学習を用いたアルゴリズムを記す.
音声解析 #まとめ編
深層学習 #まとめ編
DNN
WaveNet / 2016
- WaveNet: A Generative Model for Raw Audio
- [2016]
- arxiv.org
RNN
Deep Speech / 2014
- Deep Speech: Scaling up end-to-end speech recognition
- [2014]
- arxiv.org
CNN
Wav2letter / 2016
- Wav2Letter: an End-to-End ConvNet-based Speech Recognition System
- [2016 Facebook]
wav2vec /2019
- wav2vec: Unsupervised Pre-training for Speech Recognition
- [2019]
- arxiv.org
wav2vec 2.0 / 2020
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
- [2020]
- arxiv.org
wav2vec-U / 2021
- Unsupervised Speech Recognition
- [2021]
- arxiv.org
Attention
ESPnet / 2018
アルゴリズムと実装を含めた総称.
- ESPnet: End-to-End Speech Processing Toolkit
ReazonSpeech / 2023
ESPnet に独自のコーパスで学習することで、日本語のモデルを作成.
コーパスとモデルの総称.
ReazonSpeech: A Free and Massive Corpus for Japanese ASR
(2023-04-04) ReazonSpeechの最新モデルを公開しました
-
- 公式のデモサイト
- 公式のデモサイト
Whisper / 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Whisperモデルの軽量化
2022年の深層学習ハイライト
- Robust Speech Recognition via Large-Scale Weak Supervision
- qiita.com
OpenAIのWhisperの音声認識率を計測してみた(AmiVoice VS Whisper)
Whisperのモデル、どれを使うか
whisper、whisper.cpp、faster-whisperの比較
WhisperX / 2023
- WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
- [2023]
- arxiv.org
Squeezeformer / 2022
- Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
- [2022]
- arxiv.org
BERT-CTC / 2022
- BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model
- [2022]
- aclanthology.org
BECTRA / 2022
- BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder
- [2022]
- arxiv.org
ACE-VC / 2023
- ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations
- [2023]
- arxiv.org
JEIT / 2023
- JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition
- [2023]
- arxiv.org
Google USM / 2023
- Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
- [2023]
- arxiv.org
- ai.googleblog.com
AVFormer / 2023
工夫・テクニック
Diffusion Model
- Diffusion Model
UML / 2023
GAN
Wave-U-Net Discriminator / 2023
- Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
- [2023]
- arxiv.org
実装・ツール
- S3PRL-VC
- 音声変換ツール
- github.com
参考
- NeurIPS 2021 参加報告 前編
- wav2vec-U
- blog.recruit.co.jp