Index

Index
Post / Pre Normalization
利用されているアーキテクチャ
参考

Post / Pre Normalization

機械翻訳のタスクで、Transformer の層を深くする研究にて、2 つの手法が提案されている.

Post / Pre Normalization の適応
Residual Combine の工夫

Transformer の元論文では、各 Encoder / Decoder Layer の内部では、以下の順に処理を行うが

Sub Layer (Attention Layer or FFN Layer) -> Layer Normalization -> Combine

Post / Pre Normalization では、Layer Normalization と Combine のタイミングを工夫する.

(Post :) Sub Layer (Attention Layer or FFN Layer) -> Combine -> Layer Normalization
(Pre :) Layer Normalization -> Sub Layer (Attention Layer or FFN Layer) -> Combine

利用されているアーキテクチャ

Vision Transformer

yhayato1320.hatenablog.com

GPT-2

yhayato1320.hatenablog.com

参考

Learning Deep Transformer Models for Machine Translation
- [2019]
- 2 Post-Norm and Pre-Norm Transformer
  - 2.1 Model Layou
- arxiv.org

オムライスの備忘録

数学・統計学・機械学習・プログラミングに関することを記す

【深層学習】Post / Pre Normalization

Index

Post / Pre Normalization

利用されているアーキテクチャ

参考