최대 1 분 소요

“Deep Residual Learning for Image Recognition”이란 논문에 대한 리뷰입니다.

원문은 링크에서 확인할 수 있습니다.

Key

  • Layer for learning residual functions w.r.t. the layer inputs “Residual Learning”
  • F(x)+x=H(x) as function for original type layer
  • H(x)-x=F(x) as residual functions for ResNet
  • If residual works well, the weights tend to be zero
  • If the dimensions of input and residual functions are different, perform projection or zero padding
  • Shortcut connection: skipping one or more layers with identity mapping (that’s why + x)

Architecture

  • X -> F(x) -> ReLU -> F(x)+x

Config

  • Batch Normalization with 256 batch size
  • 1e-1 learning rate / 10% discount for plateaus
  • 1e-4 weight decay / 0.9 momentum

Insight

  • Degradation is not caused by overfitting but difficulty in optimizing
  • Even though it is deep enough to get overfitted as known before, accuracy gain from increased depth is considerable.
  • Through batch normalization, optimization difficulty is unlikely to be caused by vanishing gradients

댓글남기기