Table 3.3 Convergence rates for a...

Table 3.3

Convergence rates for a (non-comprehensive) set of federated optimization methods in non-IID settings. We summarize the key assumptions for non-IID data, local functions on each client, and other assumptions. We also present the variant of the algorithm comparing to Federated Averaging and the convergence rates that eliminate constant

Non-IID Assumptions
Symbol	Full Name	Explanation
BCGV	Bounded inter-client gradient variance	𝔼_i \|\|∇f_i(x) − ∇F(x)\|\|² ≤ η²
BOBD	Bounded optimal objective difference	$F^{} - E_{i} [f_{i}^{}] \leq η^{2}$
BOGV	Bounded optimal gradient variance	𝔼_i \|\|∇f_i(x*)\|\|² ≤ η²
BGV	Bounded gradient dissimilarity	𝔼_i \|\|∇f_i(x)\|\|²/\|\|∇F(x)\|\|² ≤ η²

Non-IID Assumptions
Symbol	Full Name	Explanation
BCGV	Bounded inter-client gradient variance	𝔼_i \|\|∇f_i(x) − ∇F(x)\|\|² ≤ η²
BOBD	Bounded optimal objective difference	$F^{} - E_{i} [f_{i}^{}] \leq η^{2}$
BOGV	Bounded optimal gradient variance	𝔼_i \|\|∇f_i(x*)\|\|² ≤ η²
BGV	Bounded gradient dissimilarity	𝔼_i \|\|∇f_i(x)\|\|²/\|\|∇F(x)\|\|² ≤ η²

Other Assumptions and Variants
Symbol	Explanation
CVX	Each client function f_i(x) is convex.
SCVX	Each client function f_i(x) is μ-strongly convex.
BNCVX	Each client function has bounded nonconvexity with ∇²f_i(x) ⪰ −μI.
BLGV	The variance of stochastic gradients on local clients is bounded.
BLGN	The norm of any local gradient is bounded.
LBG	Clients use the full batch of local samples to compute updates.
Dec	Decentralized setting, assumes the connectivity of network is good.
AC	All clients participate in each round.
1step	One local update is performed on clients in each round.
Prox	Use proximal gradient steps on clients.
VR	Variance reduction which needs to track the state.

Other Assumptions and Variants
Symbol	Explanation
CVX	Each client function f_i(x) is convex.
SCVX	Each client function f_i(x) is μ-strongly convex.
BNCVX	Each client function has bounded nonconvexity with ∇²f_i(x) ⪰ −μI.
BLGV	The variance of stochastic gradients on local clients is bounded.
BLGN	The norm of any local gradient is bounded.
LBG	Clients use the full batch of local samples to compute updates.
Dec	Decentralized setting, assumes the connectivity of network is good.
AC	All clients participate in each round.
1step	One local update is performed on clients in each round.
Prox	Use proximal gradient steps on clients.
VR	Variance reduction which needs to track the state.

Convergence Rates
Method	Non-IID	Other Assumptions	Variant	Rate
Lian et al. [36]	BCGV	BLGV	Dec; AC; 1step	$O (1 / T) + O (1 / \sqrt{N T})$
PD-SGD [114]	BCGV	BLGV	Dec; AC	$O (N / T) + O (1 / \sqrt{N T})$
MATCHA [49]	BCGV	BLGV	Dec	$O (1 / \sqrt{T K M}) + O (M / K T)$
Khaled et al. [115]	BOGV	CVX	AC; LBG	$O (N / T) + O (1 / \sqrt{N T})$
Li et al. [116]	BOBD	SCVX; BLGV; BLGN	-	O(K/T)
FedProx [113]	BGV	BNCVX	Prox	$O (1 / \sqrt{T})$
SCAFFOLD [101]	-	SCVX; BLGV	VR	O(1/TKM) + O(e^−T)

Convergence Rates
Method	Non-IID	Other Assumptions	Variant	Rate
Lian et al. [36]	BCGV	BLGV	Dec; AC; 1step	$O (1 / T) + O (1 / \sqrt{N T})$
PD-SGD [114]	BCGV	BLGV	Dec; AC	$O (N / T) + O (1 / \sqrt{N T})$
MATCHA [49]	BCGV	BLGV	Dec	$O (1 / \sqrt{T K M}) + O (M / K T)$
Khaled et al. [115]	BOGV	CVX	AC; LBG	$O (N / T) + O (1 / \sqrt{N T})$
Li et al. [116]	BOBD	SCVX; BLGV; BLGN	-	O(K/T)
FedProx [113]	BGV	BNCVX	Prox	$O (1 / \sqrt{T})$
SCAFFOLD [101]	-	SCVX; BLGV	VR	O(1/TKM) + O(e^−T)

[ViewLarge]

Sharing Unavailable