2 분 소요

“Adaptive Federated Learning in Resource Constrained Edge Computing Systems”이란 논문에 대한 리뷰입니다.

원문은 링크에서 확인할 수 있습니다.

Background

MEC(Mobile Edge Computing)은 결국 통신 시스템적 관점에서 접근하는 것이고, 어떻게 데이터를 주고 받고 연산을 처리할 것인지 다루는 내용.

Node에서 사용하는 것은 Computation resource가 많으며 Aggregator에서 사용하는 것은 주로 Communication resource가 된다.

이런 system에서는 frequency, accuracy, resource consumption을 고려해야 하며, 이것들은 time varying하다는 특징이 있다. 본 논문에서 다루는 것은 frequency에 관한 것이다.

Contribution

  1. Theoretical Convergence Bound of GD based on
    • Federated Learning
    • Non-IID data distribution
    • The Number of local updates(frequency)
  2. Algorithm learns
    • System dynamics
    • Data Distributions
    • Model characteristics

Related Workd

  1. MEC
    • About Application offloading / Workload scheduling / Service Migration
    • Not about Communication / Computation / Train Accuracy
  2. Federated Learning
    • Asynchronous보단 Synchronous method
    • Fixed Frequency
  3. ETC
    • Secure global aggregation
    • Compressing the information
    • Participant selection
    • Non iid

이렇게 보았을 때 기존의 연구에서 부족했던 부분들은 다음과 같다

  • Convergence Bound
  • Partial Global Aggregation
  • Fixed frequency for only iid
  • Separate solver considering trade-off between communication & optimality without complexity of solving local problem

본 논문에서 다루고자 하는 것은

  • Frequency with given resource

Between each global aggregation, complexity differs.

  • Unknow non-iid at each node with different degrees of similarity
  • Real-time dynamics of systems
  • (Environment) Data collected at each node, s.t. persistent data
  • 아마 time varying 특성을 반영하기 때문에 그런 것으로 생각

Objective Function

τ, κ(frequency)는 결국 convergence dependent하고 cm, bm은 time varying한 성격을 가진다. 이러한 상황에서 non-IID 조건으로 실험했던 것이다!

Convergence Bound of wκτ

  1. Gap between wκτ & vkκτ
  2. Combine the above gap with convergence bound of vkκτ
    • Synchronized at the beginning of each interval

Experimental Results

  1. Resource Definition
    • Time as the single resource
  2. Baselines
    • Canonical federated learning on fixed(non-adaptive) value of
    • Synchronous distributed gradient descent
  3. Models & Datasets
    • Squared-SVM, regression, K-means, CNN
    • MNIST-O, MNIST-F
  4. Data Distribution case
    • Uniform / Same label(non-uniform) / entire dataset / combined uniform and non-uniform

Result

  1. Loss & Accuracy
    • Optimal value of is different for different cases and models
  2. Varying Number of nodes
    • Better or similar to fixed τ=10
  3. Varying Global Aggregation Time
    • The actual time of global aggregation is equal to the original global aggregation tiem multiplied by the adjustment factor
  4. Varying Total time budget
    • Except for case 3, decreasing * with time budget, getting close to one
  5. Instantaneous Behavior
    • For a single run of 30 sec on system, * remains stable after an initial adaptation period.
  6. Comparison to Async Distributed Gradient Descent
    • Each edge node pulls the most up-to-date model param
    • Aggregator performs weighted by the dataset size
    • (Async) fully utilize the available computational resource, resulting in hurting the overall performance (OVERFITTING)
    • Async worse than sync for non-uniform with slower convergence, sudden changes, higher loss

댓글남기기