아담 최적화 알고리즘

(Adoptive Moment Estimation)

Vdw=0,Sdw=0,Vdb=0,Sdb=0 Vdw=β1Vdw+(1β1)dWVdb=β1Vdb+(1β1)db Sdw=β2Sdw+(1β2)dW2Sdb=β2Sdb+(1β2)db2 Vdwcorrected=Vdw1β1tVdbcorrected=Vdb1β1t Sdwcorrected=Sdw1β2tSdbcorrected=Sdb1β2tw:=wαVdwcorrectedSdwcorrected+ϵb:=bαVdbcorrectedSdbcorrected+ϵV_{dw}=0, S_{dw}=0,V_{db}=0, S_{db}=0\\ \text{ }\\ V_{dw}=\beta_1V_{dw}+(1-\beta_1)dW\\ V_{db}=\beta_1V_{db}+(1-\beta_1)db\\ \text{ }\\ S_{dw}=\beta_2S_{dw}+(1-\beta_2)dW^2\\ S_{db}=\beta_2S_{db}+(1-\beta_2)db^2\\ \text{ }\\ V^{corrected}_{dw}=\frac{V_{dw}}{1-\beta_1^t}\\ V^{corrected}_{db}=\frac{V_{db}}{1-\beta_1^t}\\ \text{ }\\ S^{corrected}_{dw}=\frac{S_{dw}}{1-\beta_2^t}\\ S^{corrected}_{db}=\frac{S_{db}}{1-\beta_2^t}\\ \text{}\\ w:=w-\alpha\frac{V^{corrected}_{dw}}{\sqrt{S^{corrected}_{dw}}+\epsilon}\\ b:=b-\alpha\frac{V^{corrected}_{db}}{\sqrt{S^{corrected}_{db}}+\epsilon}\\
  • 모멘텀과 RMSprop 을 합친 최적화 알고리즘

아담 최적화 알고리즘의 하이퍼 파라미터

  • α\alpha
    • 튜닝 필요
  • β1\beta_1
    • 일반적으로 0.9 (dwdw)
  • β2\beta_2
    • 일반적으로 0.999 (dw2dw^2)
  • ϵ\epsilon
    • 크게 중요하지는 않지만 10810^{-8} 권장