Index

Index
Adam / Adaptive Moment Estimation とは
- 指数移動平均
- 不偏推定量
更新アルゴリズム
参考
- 書籍
- Web サイト

Adam / Adaptive Moment Estimation とは

Momentum と AdaGrad (RMSProp) を合わせた手法.

Momentum

$\begin{align} v\ &\leftarrow\ \alpha v\ -\ \mu \displaystyle \frac{\partial L}{\partial W} \tag{1.1} \\ W\ &\leftarrow\ W\ +\ v \tag{1.2} \end{align}$

$\mu$ は学習率 $0 \leq \mu \leq 1$
$\alpha$ は抑制パラメータ $0 \leq \alpha \leq 1$

AdaGrad

$\begin{align} h\ &\leftarrow\ h\ +\ \displaystyle \frac{\partial L}{\partial W}\ \odot \displaystyle \frac{\partial L}{\partial W} \tag{2.1} \\ W\ &\leftarrow\ W\ -\ \mu \displaystyle \frac{1}{\sqrt{h}} \displaystyle \frac{\partial L}{\partial W} \tag{2.2} \end{align}$

$\mu$ は学習率 $0 \leq \mu \leq 1$

RMSProp

$\begin{align} h\ &\leftarrow\ \alpha h\ +\ (1\ -\ \alpha)\ \displaystyle \frac{\partial L}{\partial W}\ \odot \displaystyle \frac{\partial L}{\partial W} \\ W\ &\leftarrow\ W\ -\ \mu \displaystyle \frac{1}{\sqrt{h\ +\ \epsilon}} \displaystyle \frac{\partial L}{\partial W} \end{align}$

$\mu$ は学習率 $0 \leq \mu \leq 1$
$\alpha$ は、どれだけ、直近の勾配に重きを置くかの割合パラメータ
$\epsilon$ は分母が 0 にならないための固定パラメータ $\epsilon_{n}\ =\ 10^{-6}$

指数移動平均

RMSProp 同様、減衰させるパラメータの更新式を定義する.

初期値は、 $m\ =\ 0,\ v\ =\ 0$ である.
これは、一見すると勾配の1次モーメントと2次モーメントのよい推定量のように見えるが、バイアスがある.
更新の初期はモーメントの推定値が 0 のほうに偏ってします.

不偏推定量

そこで、バイアスを補正し、できるだけ不偏推定量に近づくようにする.

$\begin{align} \hat{m}\ &\leftarrow\ \frac{m}{1\ -\ \beta_{1}^{t}} \\ \hat{v}\ &\leftarrow\ \frac{v}{1\ -\ \beta_{2}^{t}} \end{align}$

これらの調整パラメータを用いて更新する.

$W\ \leftarrow\ W\ -\ \mu \displaystyle \frac{\hat{m}}{\sqrt{\hat{v}\ +\ \epsilon}}$

更新アルゴリズム

$\begin{align} m\ &\leftarrow\ \beta_{1}\ m\ +\ (1\ -\ \beta_{1})\ \displaystyle \frac{\partial L}{\partial W} \tag{Momentum} \\ v\ &\leftarrow\ \beta_{2}\ v\ +\ (1\ -\ \beta_{2})\ \displaystyle \frac{\partial L}{\partial W} \odot \displaystyle \frac{\partial L}{\partial W} \tag{RMSProp} \\ \\ \hat{m}\ &\leftarrow\ \frac{m}{1\ -\ \beta_{1}^{t}} \\ \hat{v}\ &\leftarrow\ \frac{v}{1\ -\ \beta_{2}^{t}} \\ \\ W\ &\leftarrow\ W\ -\ \mu \displaystyle \frac{\hat{m}}{\sqrt{\hat{v}\ +\ \epsilon}} \end{align}$