[Paper] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
“MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications”이란 논문에 대한 리뷰입니다.
원문은 링크에서 확인할 수 있습니다.
Depthwise seperable convolution (DSC)연산
M(input channel), N(out channel), D-(input output kernel)이 된다.
- 기존에 (DF,DF,M) -> (DG,DG,N) 으로 바꾸던 것을 (DF,DF,M) -> (DG,DG,M) -> (DG,DG,N) 으로 바꾸도록 작업으로 나누어 진행
그래서 N DG2 DK2 M 연산량을 M DG2 DK2 + N DG2 M = M DG2 (DK2+N)으로 줄인 것
Small Deep Neural Network
Channel reduction
Computational Complexity : eg. DSC
Number of Parameter : replace 33 kernel with 11 kernel / remove fc layer
Down sampling
- The key idea MobileNet handles is DSG
Key
: Depth-wise separable convolution trade-off between latency and accuracy
- Related Work : Compressing pretrained network
Product quantization / hashing / pruning / vector quantization / Huffman coding / distillation
- Related Work: Training small network
MobileNets with resource restrictions (latency & size)
- Latency -> Depth-wise separable convolution(DSC)
- Size -> Hyper-parameter α ,ρ
Others: Flattened network / Factorized Network / Xception network / Squeezenet
Architecture
- DSC except for the first layer
- Batch-norm & ReLU except for FC layer
- Down-sampling for every conv layer
For fast sparse matrix operation, uses optimized general matrix multiply (GEMM) function
Width Multiplier α: Thinner
Use less channel in each layer
Usually 1 / 0.75 / 0.5 / 0.25
Resolution Multiplier ρ: Reduced representation
Usually 224(Original Resolution) / 192 / 160 / 128
Config
- Asynchronous gradient descent
- Less regularization and data augmentation
- Little weight decay on DSC
Insight
- DSC vs full convolution
- Comparable
- Deep & Thin model is better than shallow model
- Making thinner is reasonable than using less layer.
- Multiplier (α ,ρ)
- Trade-off between accuracy and (computation, numbesr of param)
- Especially Computation to Accuracy is log-linear relation with jump at α=0.25
댓글남기기