Typical characteristics of federated learning settings vs. distributed learning in the datacenter (e.g., [33]). Cross-device and cross-silo federated learning are two examples of FL domains, but are not intended to be exhaustive. The primary defining characteristics of FL are highlighted in bold, but the other characteristics are also critical in determining which techniques are applicable
| Datacenter Distributed Learning | Cross-Silo Federated Learning | Cross-Device Federated Learning | |
|---|---|---|---|
| Setting | Training a model on a large but “flat” dataset. Clients are compute nodes in a single cluster or datacenter. | Training a model on siloed data. Clients are different organizations (e.g., medical or financial) or geo-distributed datacenters. | The clients are a very large number of mobile or IoT devices. |
| Data distribution | Data is centrally stored and can be shuffled and balanced across clients. Any client can read any part of the dataset. | Data is generated locally and remains decentralized. Each client stores its own data and cannot read the data of other clients. Data is not independently or identically distributed. | |
| Orchestration | Centrally orchestrated. | A central orchestration server/service organizes the training, but never sees raw data. | |
| Wide-area communication | None (fully connected clients in one datacenter/cluster). | Typically a hub-and-spoke topology, with the hub representing a coordinating service provider (typically without data) and the spokes connecting to clients. | |
| Data availability | ————All clients are almost always available.———— | Only a fraction of clients are available at any one time, often with diurnal or other variations. | |
| Distribution scale | Typically 1–1000 clients. | Typically 2–100 clients. | Massively parallel, up to 1010 clients. |
| Primary bottleneck | Computation is more often the bottleneck in the datacenter, where very fast networks can be assumed. | Might be computation or communication. | Communication is often the primary bottleneck, though it depends on the task. Generally, cross-device federated computations use wi-fi or slower connections. |
| Addressability | Each client has an identity or name that allows the system to access it specifically. | Clients cannot be indexed directly (i.e., no use of client identifiers). | |
| Client statefulness | Stateful—each client may participate in each round of the computation, carrying state from round to round. | Stateless—each client will likely participate only once in a task, so generally a fresh sample of never-before-seen clients in each round of computation is assumed. | |
| Client reliability | Relatively few failures. | Highly unreliable—5% or more of the clients participating in a round of computation are expected to fail or drop out (e.g., because the device becomes ineligible when battery, network, or idleness requirements are violated). | |
| Data partition axis | Data can be partitioned/re-partitioned arbitrarily across clients. | Partition is fixed. Could be example-partitioned (horizontal) or feature-partitioned (vertical). | Fixed partitioning by example (horizontal). |
| Datacenter Distributed Learning | Cross-Silo Federated Learning | Cross-Device Federated Learning | |
|---|---|---|---|
| Setting | Training a model on a large but “flat” dataset. Clients are compute nodes in a single cluster or datacenter. | Training a model on siloed data. Clients are different organizations (e.g., medical or financial) or geo-distributed datacenters. | The clients are a very large number of mobile or IoT devices. |
| Data distribution | Data is centrally stored and can be shuffled and balanced across clients. Any client can read any part of the dataset. | ||
| Orchestration | Centrally orchestrated. | ||
| Wide-area communication | None (fully connected clients in one datacenter/cluster). | Typically a hub-and-spoke topology, with the hub representing a coordinating service provider (typically without data) and the spokes connecting to clients. | |
| Data availability | ————All clients are almost always available.———— | Only a fraction of clients are available at any one time, often with diurnal or other variations. | |
| Distribution scale | Typically 1–1000 clients. | Typically 2–100 clients. | Massively parallel, up to 1010 clients. |
| Primary bottleneck | Computation is more often the bottleneck in the datacenter, where very fast networks can be assumed. | Might be computation or communication. | Communication is often the primary bottleneck, though it depends on the task. Generally, cross-device federated computations use wi-fi or slower connections. |
| Addressability | Each client has an identity or name that allows the system to access it specifically. | Clients cannot be indexed directly (i.e., no use of client identifiers). | |
| Client statefulness | Stateful—each client may participate in each round of the computation, carrying state from round to round. | Stateless—each client will likely participate only once in a task, so generally a fresh sample of never-before-seen clients in each round of computation is assumed. | |
| Client reliability | Relatively few failures. | Highly unreliable— | |
| Data partition axis | Data can be partitioned/re-partitioned arbitrarily across clients. | Partition is fixed. Could be example-partitioned (horizontal) or feature-partitioned (vertical). | Fixed partitioning by example (horizontal). |
Sharing content requires targeting cookies to be enabled. Please update your cookie preferences to use this feature.