Yejin Kim

Study it and apply all!
Like to see error code because it is much better than warrning!

[Paper] Deep Residual Learning for Image Recognition

최대 1 분 소요

“Deep Residual Learning for Image Recognition”이란 논문에 대한 리뷰입니다.

원문은 링크에서 확인할 수 있습니다.

Key

Layer for learning residual functions w.r.t. the layer inputs “Residual Learning”
F(x)+x=H(x) as function for original type layer
H(x)-x=F(x) as residual functions for ResNet
If residual works well, the weights tend to be zero
If the dimensions of input and residual functions are different, perform projection or zero padding
Shortcut connection: skipping one or more layers with identity mapping (that’s why + x)

Architecture

X -> F(x) -> ReLU -> F(x)+x

Config

Batch Normalization with 256 batch size
1e-1 learning rate / 10% discount for plateaus
1e-4 weight decay / 0.9 momentum

Insight

Degradation is not caused by overfitting but difficulty in optimizing
Even though it is deep enough to get overfitted as known before, accuracy gain from increased depth is considerable.
Through batch normalization, optimization difficulty is unlikely to be caused by vanishing gradients

공유하기

Twitter Facebook LinkedIn

댓글남기기

참고

[Survey] Recent technical reports

최대 1 분 소요

This is a collection of recent technical reports from several vendors including Google DeepMind, x.AI, AllenAI, AI21Labs, Databricks and HyperCLOVA.

[Survey] Recent approaches about Super alignments

1 분 소요

This is a collection of recent approaches and papers about super-alignment and relevant topics.

[Survey] Recent approaches about Efficient ML

최대 1 분 소요

This is a collection of recent approaches and papers about Efficient ML including Parameter Efficient Fine Tuning(PEFT), qunatization, pruning and other topi...

[Paper] Using Sliced Mutual Information to Study Memorization and Generalization

최대 1 분 소요

Using Sliced Mutual Information to Study Memorization and Generalization PDF Link