1 분 소요

“Self-Supervised Visual Feature Learning With Deep Neural Networks: A Survey”이란 논문에 대한 리뷰입니다.

원문은 링크에서 확인할 수 있습니다.

Insight

  • First train general attributes
  • Shallower blocks focus on the low-level general features
  • Deeper blocks focus on the high-level task specific features

  • Effective pretext tasks ensure that semantic features learned through the process of accomplishing the pretext tasks.

Pseudo labels without human annotations

  • Minimizing error between the prediction & pseudo labels

Schema

Pre-training with pretext

  • (Unlabelled, Self-supervised) Generation based
  • Context based
  • Free semantic label based
  • Cross modal based

Downstream Task (Labelled) Image Classification

  • Sematic segmentation
  • Object detection
  • Human action recognition

Feature Learnings description

  • Generation based Image Generation GAN
  • Video Generation
  • Context based Context Similarity Between image patches
    • Spatial Context Similarity Spatial relation among image patches
    • Temporal Context Similarity Frame sequence (Video)
  • Free semantic label based
  • Cross modal based Two different channels of input data correspond to each other

Downstream Task Describe

  • Image Classification (Quality of feature)
  • Sematic segmentation (Generality of feature) -Assign semantic labels to each pixel (Pixelwise annotation)
  • Object detection (Generality of feature)
  • Localizing the position of objects
  • Proposals based on feature map
  • FC to bound box of object
  • Human action recognition (Quality of feature)

Feature Learning

  • Generation based
    • Pseudo labels are images themselves
  • Context based
    • (Common) clustering
    • Predictive task for group ID
    • Contrastive task for distance btw features
  • Free semantic label based
  • Cross modal based

Qualitative evaluation

  • Kernel Visualization Compare first convolution layer with supervised one
  • Feature Map Visualization Attention of networks

댓글남기기