[Paper] Deep Residual Learning for Image Recognition
“Deep Residual Learning for Image Recognition”이란 논문에 대한 리뷰입니다.
원문은 링크에서 확인할 수 있습니다.
Key
- Layer for learning residual functions w.r.t. the layer inputs “Residual Learning”
- F(x)+x=H(x) as function for original type layer
- H(x)-x=F(x) as residual functions for ResNet
- If residual works well, the weights tend to be zero
- If the dimensions of input and residual functions are different, perform projection or zero padding
- Shortcut connection: skipping one or more layers with identity mapping (that’s why + x)
Architecture
- X -> F(x) -> ReLU -> F(x)+x
Config
- Batch Normalization with 256 batch size
- 1e-1 learning rate / 10% discount for plateaus
- 1e-4 weight decay / 0.9 momentum
Insight
- Degradation is not caused by overfitting but difficulty in optimizing
- Even though it is deep enough to get overfitted as known before, accuracy gain from increased depth is considerable.
- Through batch normalization, optimization difficulty is unlikely to be caused by vanishing gradients
댓글남기기