提交 1037c93e 编辑于 作者: Galen Andrew's avatar Galen Andrew 提交者: tensorflow-copybara
浏览文件

Fix package prefix on DifferentiallyPrivateFactory in tutorial.

PiperOrigin-RevId: 392470163
上级 ffdd2845
%% Cell type:markdown id: tags:
##### Copyright 2021 The TensorFlow Federated Authors.
%% Cell type:code id: tags:
```
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
```
%% Cell type:markdown id: tags:
# Tuning recommended aggregations for learning
%% Cell type:markdown id: tags:
<table class="tfo-notebook-buttons" align="left">
<td>
<a target="_blank" href="https://www.tensorflow.org/federated/tutorials/tuning_recommended_aggregators"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
</td>
<td>
<a target="_blank" href="https://colab.research.google.com/github/tensorflow/federated/blob/master/docs/tutorials/tuning_recommended_aggregators.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
</td>
<td>
<a target="_blank" href="https://github.com/tensorflow/federated/blob/master/docs/tutorials/tuning_recommended_aggregators.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
</td>
<td>
<a href="https://storage.googleapis.com/tensorflow_docs/federated/docs/tutorials/tuning_recommended_aggregators.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
</td>
</table>
%% Cell type:markdown id: tags:
The `tff.learning` module contains a number of ways to aggregate model udpates with recommended default configuration:
* `tff.learning.robust_aggregator`
* `tff.learning.dp_aggregator`
* `tff.learning.compression_aggregator`
* `tff.learning.secure_aggregator`
In this tutorial, we explain the underlying motivation, how they are implemented, and provide suggestions for how to customize their configuration.
%% Cell type:markdown id: tags:
---
%% Cell type:code id: tags:
```
#@test {"skip": true}
!pip install --quiet --upgrade tensorflow-federated-nightly
!pip install --quiet --upgrade nest-asyncio
import nest_asyncio
nest_asyncio.apply()
```
%% Cell type:code id: tags:
```
import math
import tensorflow_federated as tff
tff.federated_computation(lambda: 'Hello, World!')()
```
%% Output
b'Hello, World!'
%% Cell type:markdown id: tags:
Aggregation methods are represented by objects that can be passed to `tff.learning.build_federated_averaging_process` as its `model_update_aggregation_factory` keyword argument. As such, the aggregators discussed here can be directly used to modify a [previous](federated_learning_for_image_classification.ipynb) [tutorial](federated_learning_for_text_generation.ipynb) on federated learning.
The baseline weighted mean from the [FedAvg](http://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf) algorithm can be expressed using `tff.aggregators.MeanFactory` as follows:
%% Cell type:markdown id: tags:
```
mean = tff.aggregators.MeanFactory()
iterative_process = tff.learning.build_federated_averaging_process(
...,
model_update_aggregation_factory=mean)
```
%% Cell type:markdown id: tags:
The techniques which can be used to extend the weighted mean covered in this tutorial are:
* Zeroing
* Clipping
* Differential Privacy
* Compression
* Secure Aggregation
The extension is done using composition, in which the `MeanFactory` wraps an inner factory to which it delegates some part of the aggregation, or is itself wrapped by another aggregation factory. For more detail on the design, see [Implementing custom aggregators](custom_aggregators.ipynb) tutorial.
First, we will explain how to enable and configure these techniques individually, and then show how they can be combined together.
%% Cell type:markdown id: tags:
## Techniques
Before delving into the individual techniques, we first introduce the quantile matching algorithm, which will be useful for configuring the techniques below.
%% Cell type:markdown id: tags:
### Quantile matching
Several of the aggregation techniques below need to use a norm bound that controls some aspect of the aggregation. Such bounds can be provided as a constant, but usually it is better to adapt the bound during the course of training. The recommended way is to use the quantile matching algorithm of [Andrew et al. (2019)](https://arxiv.org/abs/1905.03871), initially proposed for its compatibility with differential privacy but useful more broadly. To estimate the value at a given quantile, you can use `tff.aggregators.PrivateQuantileEstimationProcess`. For example, to adapt to the median of a distribution, you can use:
%% Cell type:code id: tags:
```
median_estimate = tff.aggregators.PrivateQuantileEstimationProcess.no_noise(
initial_estimate=1.0, target_quantile=0.5, learning_rate=0.2)
```
%% Cell type:markdown id: tags:
Different techinques which use the quantile estimation algorithm will require different values of the algorithm parameters, as we will see. In general, increasing the `learning_rate` parameter means faster adaptation to the correct quantile, but with a higher variance. The `no_noise` classmethod constructs a quantile matching process that does not add noise for differential privacy.
%% Cell type:markdown id: tags:
### Zeroing
Zeroing refers to replacing unusually large values by zeros. Here, "unusually large" could mean larger than a predefined threshold, or large relative to values from previous rounds of the computation. Zeroing can increase system robustness to data corruption on faulty clients.
To compute a mean of values with L-infinity norms larger than `ZEROING_CONSTANT` zeroed-out, we wrap a `tff.aggregators.MeanFactory` with a `tff.aggregators.zeroing_factory` that performs the zeroing:
%% Cell type:markdown id: tags:
```
zeroing_mean = tff.aggregators.zeroing_factory(
zeroing_norm=MY_ZEROING_CONSTANT,
inner_agg_factory=tff.aggregators.MeanFactory())
```
%% Cell type:markdown id: tags:
Here we wrap a `MeanFactory` with a `zeroing_factory` because we want the (pre-aggregation) effects of the `zeroing_factory` to apply to the values at clients before they are passed to the inner `MeanFactory` for aggregation via averaging.
However, for most applications we recommend adaptive zeroing with the quantile estimator. To do so, we use the quantile matching algorithm as follows:
%% Cell type:code id: tags:
```
zeroing_norm = tff.aggregators.PrivateQuantileEstimationProcess.no_noise(
initial_estimate=10.0,
target_quantile=0.98,
learning_rate=math.log(10),
multiplier=2.0,
increment=1.0)
zeroing_mean = tff.aggregators.zeroing_factory(
zeroing_norm=zeroing_norm,
inner_agg_factory=tff.aggregators.MeanFactory())
# Equivalent to:
# zeroing_mean = tff.learning.robust_aggregator(clipping=False)
```
%% Cell type:markdown id: tags:
The parameters have been chosen so that the process adapts very quickly (relatively large `learning_rate`) to a value somewhat larger than the largest values seen so far. For a quantile estimate `Q`, the threshold used for zeroing will be `Q * multiplier + increment`.
%% Cell type:markdown id: tags:
### Clipping to bound L2 norm
Clipping client updates (projecting onto an L2 ball) can improve robustness to outliers. A `tff.aggregators.clipping_factory` is structured exactly like `tff.aggregators.zeroing_factory` discussed above, and can take either a constant or a `tff.templates.EstimationProcess` as its `clipping_norm` argument. The recommended best practice is to use clipping that adapts moderately quickly to a moderately high norm, as follows:
%% Cell type:code id: tags:
```
clipping_norm = tff.aggregators.PrivateQuantileEstimationProcess.no_noise(
initial_estimate=1.0,
target_quantile=0.8,
learning_rate=0.2)
clipping_mean = tff.aggregators.clipping_factory(
clipping_norm=clipping_norm,
inner_agg_factory=tff.aggregators.MeanFactory())
# Equivalent to:
# clipping_mean = tff.learning.robust_aggregator(zeroing=False)
```
%% Cell type:markdown id: tags:
In our experience over many problems, the precise value of `target_quantile` does not seem to matter too much so long as learning rates are tuned appropriately. However, setting it very low may require increasing the server learning rate for best performance, relative to not using clipping, which is why we recommend 0.8 by default.
%% Cell type:markdown id: tags:
### Differential Privacy
TFF supports differentially private aggregation as well, using adaptive clipping and Gaussian noise. An aggregator to perform differentially private averaging can be constructed as follows:
%% Cell type:code id: tags:
```
dp_mean = differential_privacy.DifferentiallyPrivateFactory.gaussian_adaptive(
dp_mean = tff.aggregators.DifferentiallyPrivateFactory.gaussian_adaptive(
noise_multiplier, clients_per_round)
# Equivalent to:
# dp_mean = tff.learning.dp_aggregator(
# noise_multiplier, clients_per_round, zeroing=False)
```
%% Cell type:markdown id: tags:
Guidance on how to set the `noise_multiplier` argument can be found in the [TFF DP tutorial](https://www.tensorflow.org/federated/tutorials/federated_learning_with_differential_privacy).
%% Cell type:markdown id: tags:
### Lossy Compression
Compared to lossless compression such as gzip, lossy compression generally results in a much higher compression ratio and can still be combined with lossless compression afterwards. Since less time needs to be spent on client-to-server communication, training rounds complete faster. Due to the inherently randomized nature of learning algorithms, up to some threshold, the inaccuracy from lossy compression does not have negative impact on the overall performance.
The default recommendation is to use simple uniform quantization (see [Suresh et al.](http://proceedings.mlr.press/v70/suresh17a/suresh17a.pdf) for instance), parameterized by two values: the tensor size compression `threshold` and the number of `quantization_bits`. For every tensor `t`, if the number of elements of `t` is less or equal to `threshold`, it is not compressed. If it is larger, the elements of `t` are quantized using randomized rounding to `quantizaton_bits` bits. That is, we apply the operation
`t = round((t - min(t)) / (max(t) - min(t)) * (2**quantizaton_bits - 1)),`
resulting in integer values in the range of `[0, 2**quantizaton_bits-1]`. The quantized values are directly packed into an integer type for transmission, and then the inverse transformation is applied.
We recommend setting `quantizaton_bits` equal to 8 and `threshold` equal to 20000:
%% Cell type:code id: tags:
```
compressed_mean = tff.aggregators.MeanFactory(
tff.aggregators.EncodedSumFactory.quantize_above_threshold(
quantization_bits=8, threshold=20000))
# Equivalent to:
# compressed_mean = tff.learning.compression_aggregator(zeroing=False, clipping=False)
```
%% Cell type:markdown id: tags:
#### Tuning suggestions
Both parameters, `quantization_bits` and `threshold` can be adjusted, and the number of clients participating in each training round can also impact the effectiveness of compression.
**Threshold.** The default value of 20000 is chosen because we have observed that variables with small number of elements, such as biases in common layer types, are much more sensitive to introduced noise. Moreover, there is little to be gained from compressing variables with small number of elements in practice, as their uncompressed size is relatively small to begin with.
In some applications it may make sense to change the choice of threshold. For instance, the biases of the output layer of a classification model may be more sensitive to noise. If you are training a language model with a vocabulary of 20004, you may want to set `threshold` to be 20004.
**Quantization bits.** The default value of 8 for `quantization_bits` should be fine for most users. If 8 is working well and you want to squeeze out a bit more performance, you could try taking it down to 7 or 6. If resources permit doing a small grid search, we would recommend that you identify the value for which training becomes unstable or final model quality starts to degrade, and then increase that value by two. For example, if setting `quantization_bits` to 5 works, but setting it to 4 degrades the model, we would recommend the default to be 6 to be "on the safe side".
**Clients per round.** Note that significantly increasing the number of clients per round can enable a smaller value for `quantization_bits` to work well, because the randomized inaccuracy introduced by quantization may be evened out by averaging over more client updates.
%% Cell type:markdown id: tags:
### Secure Aggregation
By Secure Aggregation (SecAgg) we refer to a cryptographic protocol wherein client updates are encrypted in such a way that the server can only decrypt their sum. If the number of clients that report back is insufficient, the server will learn nothing at all -- and in no case will the server be able to inspect individual updates. This is realized using the `tff.federated_secure_sum_bitwidth` operator.
The model updates are floating point values, but SecAgg operates on integers. Therefore we need to clip any large values to some bound before discretization to an integer type. The clipping bound can be either a constant or determined adaptively (the recommended default). The integers are then securely summed, and the sum is mapped back to the floating point domain.
To compute a mean with weighted values summed using SecAgg with `MY_SECAGG_BOUND` as the clipping bound, pass `SecureSumFactory` to `MeanFactory` as:
%% Cell type:markdown id: tags:
```
secure_mean = tff.aggregators.MeanFactory(
tff.aggregators.SecureSumFactory(MY_SECAGG_BOUND))
```
%% Cell type:markdown id: tags:
To do the same while determining bounds adaptively:
%% Cell type:code id: tags:
```
secagg_bound = tff.aggregators.PrivateQuantileEstimationProcess.no_noise(
initial_estimate=50.0,
target_quantile=0.95,
learning_rate=1.0,
multiplier=2.0)
secure_mean = tff.aggregators.MeanFactory(
tff.aggregators.SecureSumFactory(secagg_bound))
# Equivalent to:
# secure_mean = tff.learning.secure_aggregator(zeroing=Fasle, clipping=False)
```
%% Cell type:markdown id: tags:
#### Tuning suggestions
The adaptive parameters have been chosen so that the bounds are tight (we won't lose much precision in discretization) but clipping happens rarely.
If tuning the parameters, keep in mind that the SecAgg protocol is summing the weighted model updates, after weighting in the mean. The weights are typically the number of data points processed locally, hence between different tasks, the right bound might depend on this quantity.
We do not recommend using the `increment` keyword argument when creating adaptive `secagg_bound`, as this could result in a large relative precision loss, in the case the actual estimate ends up being small.
The above code snippet will use SecAgg only the weighted values. If SecAgg should be also used for the sum of weights, we recommend the bounds to be set as constants, as in a common training setup, the largest possible weight will be known in advance:
%% Cell type:markdown id: tags:
```
secure_mean = tff.aggregators.MeanFactory(
value_sum_factory=tff.aggregators.SecureSumFactory(secagg_bound),
weight_sum_factory=tff.aggregators.SecureSumFactory(
upper_bound_threshold=MAX_WEIGHT, lower_bound_threshold=0.0))
```
%% Cell type:markdown id: tags:
## Composing techniques
Individual techniques for extending a mean introduced above can be combined together.
We recommend the order in which these techniques are applied at clients to be
1. Zeroing
1. Clipping
1. Other techniques
The aggregators in `tff.aggregators` module are composed by wrapping "inner aggregators" (whose pre-aggregation effects happen last and post-aggregation effects happen first) inside "outer aggregators". For example, to perform zeroing, clipping, and compression (in that order), one would write:
%% Cell type:markdown id: tags:
```
# Compression is innermost because its pre-aggregation effects are last.
compressed_mean = tff.aggregators.MeanFactory(
tff.aggregators.EncodedSumFactory.quantize_above_threshold(
quantization_bits=8, threshold=20000))
# Compressed mean is inner aggregator to clipping...
clipped_compressed_mean = tff.aggregators.clipping_factory(
clipping_norm=MY_CLIPPING_CONSTANT,
inner_agg_factory=compressed_mean)
# ...which is inner aggregator to zeroing, since zeroing happens first.
final_aggregator = tff.aggregators.zeroing_factory(
zeroing_norm=MY_ZEROING_CONSTANT,
inner_agg_factory=clipped_compressed_mean)
```
%% Cell type:markdown id: tags:
Note that this structure matches the [default aggregators](https://github.com/tensorflow/federated/blob/11e4f632b38745c9b38cc39fa1fe67771c206e77/tensorflow_federated/python/learning/model_update_aggregator.py) for learning algorithms.
Other compositions are possible, too. We extend this document when we are confident that we can provide default configuration which works in multiple different applications. For implementing new ideas, see [Implementing custom aggregators](custom_aggregators.ipynb) tutorial.
......
支持 Markdown
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册