์ •๊ทœํ™”

์ •๊ทœํ™” (Regularization)

  • ์ •๊ทœํ™”๋Š” ๊ณ ๋ถ„์‚ฐ(๊ณผ์ ํ•ฉ, overfitting) ์˜ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž„
  • ๋” ๋งŽ์€ ํ›ˆ๋ จ ์„ธํŠธ๋ฅผ ํ™•๋ณดํ•˜๋Š” ๊ฒƒ์ด ์ œ์ผ ์ข‹๊ฒ ์œผ๋‚˜ ํ•ญ์ƒ ์ด๊ฒƒ์ด ๊ฐ€๋Šฅํ•˜์ง€๋Š” ์•Š์Œ. ์ด๋Ÿด ๋•Œ ์‹œ๋„ํ•ด ๋ณผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.
  • ์ •๊ทœํ™”๋ฅผ ๋„์ž…ํ•˜๊ธฐ ์œ„ํ•ด ๋ฐฉ๋ฒ•์— ๋”ฐ๋ผ ๋น„์šฉ ํ•จ์ˆ˜ JJ ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.

L2 ์ •๊ทœํ™”

J(w,b)=1mโˆ‘i=1mL(y^(i),y(i))+ฮป2mโˆฃโˆฃwโˆฃโˆฃ22โˆฃโˆฃwโˆฃโˆฃ22=โˆ‘j=1nxwj2=wTwJ(w, b) = \frac{1}{m}\sum_{i=1}^{m}\mathcal{L}(\hat y^{(i)}, y^{(i)}) + \frac{\lambda}{2m}||w||_2^2\\||w||_2^2=\sum_{j=1}^{n_x}w_j^2=w^Tw

L1 ์ •๊ทœํ™”

J(w,b)=1mโˆ‘i=1mL(y^(i),y(i))+ฮป2mโˆฃโˆฃwโˆฃโˆฃ1โˆฃโˆฃwโˆฃโˆฃ1=โˆ‘j=1nxโˆฃwโˆฃJ(w, b) = \frac{1}{m}\sum_{i=1}^{m}\mathcal{L}(\hat y^{(i)}, y^{(i)}) + \frac{\lambda}{2m}||w||_1\\||w||_1=\sum_{j=1}^{n_x}|w|
  • ๋ณดํ†ต L1 ๋ณด๋‹ค L2๊ฐ€ ๋งŽ์ด ์‚ฌ์šฉ๋จ
  • ์—ฌ๊ธฐ์„œ ฮป\lambda ๋Š” ์ •๊ทœํ™” ๊ณ„์ˆ˜ (Regaularization Parameter) ๋กœ์จ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์ค‘ ํ•˜๋‚˜์ž„

๋ ˆ์ด์–ด ll ์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜ ์ •๊ทœํ™”

J(w[1],b[1],w[2],b[2],โ€ฆ,w[l],b[l])=1mโˆ‘i=1mL(y^(i),y(i))+ฮป2mโˆ‘l=1Lโˆฃโˆฃw[l]โˆฃโˆฃF2J(w^{[1]}, b^{[1]}, w^{[2]}, b^{[2]}, \dots,w^{[l]}, b^{[l]}) = \frac{1}{m}\sum_{i=1}^{m}L(\hat y^{(i)}, y^{(i)}) + \frac{\lambda}{2m}\sum_{l=1}^L||w^{[l]}||^2_F โˆฃโˆฃw[l]โˆฃโˆฃF2=โˆ‘i=1n[lโˆ’1]โˆ‘j=1n[l](wij)2||w^{[l]}||^2_F=\sum_{i=1}^{n^{[l-1]}}\sum_{j=1}^{n^{[l]}}(w_{ij})^2
  • ์‹์€ ์–ด๋ ค์›Œ ๋ณด์ด์ง€๋งŒ ๋‹จ์ˆœํžˆ w[l]w^{[l]} ์˜ ๋ชจ๋“  ์š”์†Œ๋“ค์˜ ์ œ๊ณฑ์˜ ํ•ฉ
  • ๊ฐ€์ค‘์น˜ ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ํฌ๋ฉด ๋น„์šฉ ํ•จ์ˆ˜์˜ ๊ฐ’์ด ์ปค์ง„๋‹ค โ†’ ์ฆ‰ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•์€ ww, bb ์˜ ๊ฐ’์ด ๊ฐ์†Œํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ–‰๋œ๋‹ค.
  • โˆฃโˆฃโ‹…โˆฃโˆฃF||\centerdot||_F ์„ Frobenius Norm ์ด๋ผ๊ณ  ํ•œ๋‹ค. (Norm: ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ)

์ •๊ทœํ™”์˜ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•, ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ

dw[l]=(fromย backprop)+ฮปmw[l]dw^{[l]}=\text{(from backprop)}+\frac{\lambda}{m}w^{[l]}
  • ๊ธฐ์กด ์—ญ์ „ํŒŒ๋ฅผ ํ†ตํ•ด dw[l]dw^{[l]} ๋ฅผ ๊ตฌํ•œ ํ›„ ์ •๊ทœํ™” ํ•ญ์„ ์ถ”๊ฐ€ํ•œ๋‹ค.
w[l]:=w[l]โˆ’ฮฑdw[l]=w[l]โˆ’ฮฑ{(fromย backprop)+ฮปmw[l]}=w[l]โˆ’ฮฑ(fromย backprop)โˆ’ฮฑฮปmw[l]w^{[l]}:=w^{[l]}-\alpha dw^{[l]}\\=w^{[l]}-\alpha {\{(\text{from backprop})+\frac{\lambda}{m}w^{[l]}\}} \\=w^{[l]}-\alpha {(\text{from backprop})-\alpha \frac{\lambda}{m}w^{[l]}}
  • ์ดํ›„ ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉด ํ•ด๋‹น ๋ ˆ์ด์–ด ll ์˜ ๊ฐ€์ค‘์น˜ ๋ฒกํ„ฐ norm ์ด ํด ์ˆ˜๋ก w[l]w^{[l]} ๋Š” ๋” ๋งŽ์ด ๊ฐ์†Œ๋œ๋‹ค.
  • ์ด๋Ÿฐ ๋ฐฉ์‹์ด๊ธฐ ๋•Œ๋ฌธ์— L2 ์ •๊ทœํ™”๋Š” โ€œ๊ฐ€์ค‘์น˜ ๊ฐ์†Œโ€ ๋ผ๊ณ ๋„ ๋ถ€๋ฅธ๋‹ค.