Convergence rates for a (non-comprehensive) set of federated optimization methods in non-IID settings. We summarize the key assumptions for non-IID data, local functions on each client, and other assumptions. We also present the variant of the algorithm comparing to Federated Averaging and the convergence rates that eliminate constant
| Non-IID Assumptions | ||
|---|---|---|
| Symbol | Full Name | Explanation |
| BCGV | Bounded inter-client gradient variance | 𝔼i ||∇fi(x) − ∇F(x)||2 ≤ η2 |
| BOBD | Bounded optimal objective difference | |
| BOGV | Bounded optimal gradient variance | 𝔼i ||∇fi(x*)||2 ≤ η2 |
| BGV | Bounded gradient dissimilarity | 𝔼i ||∇fi(x)||2/||∇F(x)||2 ≤ η2 |
| Non-IID Assumptions | ||
|---|---|---|
| BCGV | Bounded inter-client gradient variance | 𝔼 |
| BOBD | Bounded optimal objective difference | |
| BOGV | Bounded optimal gradient variance | 𝔼 |
| BGV | Bounded gradient dissimilarity | 𝔼 |
| Other Assumptions and Variants | |
|---|---|
| Symbol | Explanation |
| CVX | Each client function fi(x) is convex. |
| SCVX | Each client function fi(x) is μ-strongly convex. |
| BNCVX | Each client function has bounded nonconvexity with ∇2fi(x) ⪰ −μI. |
| BLGV | The variance of stochastic gradients on local clients is bounded. |
| BLGN | The norm of any local gradient is bounded. |
| LBG | Clients use the full batch of local samples to compute updates. |
| Dec | Decentralized setting, assumes the connectivity of network is good. |
| AC | All clients participate in each round. |
| 1step | One local update is performed on clients in each round. |
| Prox | Use proximal gradient steps on clients. |
| VR | Variance reduction which needs to track the state. |
| Other Assumptions and Variants | |
|---|---|
| CVX | Each client function |
| SCVX | Each client function |
| BNCVX | Each client function has bounded nonconvexity with ∇2 |
| BLGV | The variance of stochastic gradients on local clients is bounded. |
| BLGN | The norm of any local gradient is bounded. |
| LBG | Clients use the full batch of local samples to compute updates. |
| Dec | Decentralized setting, assumes the connectivity of network is good. |
| AC | All clients participate in each round. |
| 1step | One local update is performed on clients in each round. |
| Prox | Use proximal gradient steps on clients. |
| VR | Variance reduction which needs to track the state. |
| Convergence Rates | ||||
|---|---|---|---|---|
| Method | Non-IID | Other Assumptions | Variant | Rate |
| Lian et al. [36] | BCGV | BLGV | Dec; AC; 1step | |
| PD-SGD [114] | BCGV | BLGV | Dec; AC | |
| MATCHA [49] | BCGV | BLGV | Dec | |
| Khaled et al. [115] | BOGV | CVX | AC; LBG | |
| Li et al. [116] | BOBD | SCVX; BLGV; BLGN | - | O(K/T) |
| FedProx [113] | BGV | BNCVX | Prox | |
| SCAFFOLD [101] | - | SCVX; BLGV | VR | O(1/TKM) + O(e−T) |
| Convergence Rates | ||||
|---|---|---|---|---|
| Lian | BCGV | BLGV | Dec; AC; 1step | |
| PD-SGD [ | BCGV | BLGV | Dec; AC | |
| MATCHA [ | BCGV | BLGV | Dec | |
| Khaled | BOGV | CVX | AC; LBG | |
| Li | BOBD | SCVX; BLGV; BLGN | - | |
| FedProx [ | BGV | BNCVX | Prox | |
| SCAFFOLD [ | - | SCVX; BLGV | VR | |
Sharing content requires targeting cookies to be enabled. Please update your cookie preferences to use this feature.