提交 eeaea774 编辑于 作者: Mark Daoust's avatar Mark Daoust 提交者: tensorflow-copybara
浏览文件

Add download link

PiperOrigin-RevId: 393815864
上级 2130cf5a
package(default_visibility = ["//visibility:private"])
licenses(["notice"])
exports_files([
"jax_support.ipynb",
])
package(default_visibility = ["//visibility:private"])
licenses(["notice"])
exports_files([
"openmined_conference_2020.ipynb",
])
package(default_visibility = ["//visibility:private"])
licenses(["notice"])
exports_files([
"building_your_own_federated_learning_algorithm.ipynb",
"custom_aggregators.ipynb",
"custom_federated_algorithm_with_tff_optimizers.ipynb",
"custom_federated_algorithms_1.ipynb",
"custom_federated_algorithms_2.ipynb",
"federated_learning_for_image_classification.ipynb",
"federated_learning_for_text_generation.ipynb",
"federated_learning_with_differential_privacy.ipynb",
"federated_reconstruction_for_matrix_factorization.ipynb",
"federated_select.ipynb",
"random_noise_generation.ipynb",
"simulations.ipynb",
"simulations_with_accelerators.ipynb",
"sparse_federated_learning.ipynb",
"tuning_recommended_aggregators.ipynb",
"tff_for_federated_learning_research_compression.ipynb",
"working_with_client_data.ipynb",
])
%% Cell type:markdown id: tags:
# High-performance Simulation with Kubernetes
This tutorial will describe how to set up high-performance simulation using a
TFF runtime running on Kubernetes. The model is the same as in the previous
tutorial, **High-performance simulations with TFF**. The only difference is that
here we use a worker pool instead of a local executor.
This tutorial refers to Google Cloud's [GKE](https://cloud.google.com/kubernetes-engine/) to create the Kubernetes cluster,
but all the steps after the cluster is created can be used with any Kubernetes
installation.
%% Cell type:markdown id: tags:
<table class="tfo-notebook-buttons" align="left">
<td>
<a target="_blank" href="https://www.tensorflow.org/federated/tutorials/high_performance_simulation_with_kubernetes"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
</td>
<td>
<a target="_blank" href="https://colab.research.google.com/github/tensorflow/federated/blob/master/docs/tutorials/high_performance_simulation_with_kubernetes.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
</td>
<td>
<a target="_blank" href="https://github.com/tensorflow/federated/blob/master/docs/tutorials/high_performance_simulation_with_kubernetes.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
</td>
<td>
<a href="https://storage.googleapis.com/tensorflow_docs/federated/docs/tutorials/high_performance_simulation_with_kubernetes.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
</td>
</table>
%% Cell type:markdown id: tags:
## Launch the TFF Workers on GKE
> **Note:** This tutorial assumes the user has an existing GCP project.
### Create a Kubernetes Cluster
The following step only needs to be done once. The cluster can be re-used for future workloads.
Follow the GKE instructions to [create a container cluster](https://cloud.google.com/kubernetes-engine/docs/tutorials/hello-app#step_4_create_a_container_cluster). The rest of this tutorial assumes that the cluster is named `tff-cluster`, but the actual name isn't important.
Stop following the instructions when you get to "*Step 5: Deploy your application*".
### Deploy the TFF Worker Application
The commands to interact with GCP can be run [locally](https://cloud.google.com/kubernetes-engine/docs/tutorials/hello-app#option_b_use_command-line_tools_locally) or in the [Google Cloud Shell](https://cloud.google.com/shell/). We recommend the Google Cloud Shell since it doesn't require additional setup.
1. Run the following command to launch the Kubernetes application.
```
$ kubectl create deployment tff-workers --image=gcr.io/tensorflow-federated/remote-executor-service:latest
```
2. Add a load balancer for the application.
```
$ kubectl expose deployment tff-workers --type=LoadBalancer --port 80 --target-port 8000
```
> **Note:** This exposes your deployment to the internet and is for demo
purposes only. For production use, a firewall and authentication are strongly
recommended.
%% Cell type:markdown id: tags:
Look up the IP address of the loadbalancer on the Google Cloud Console. You'll need it later to connect the training loop to the worker app.
%% Cell type:markdown id: tags:
### (Alternately) Launch the Docker Container Locally
```
$ docker run --rm -p 8000:8000 gcr.io/tensorflow-federated/remote-executor-service:latest
```
%% Cell type:markdown id: tags:
## Set Up TFF Environment
%% Cell type:code id: tags:
```
#@test {"skip": true}
!pip install --quiet --upgrade tensorflow-federated-nightly
!pip install --quiet --upgrade nest-asyncio
import nest_asyncio
nest_asyncio.apply()
```
%% Output
%% Cell type:markdown id: tags:
## Define the Model to Train
%% Cell type:code id: tags:
```
import collections
import time
import tensorflow as tf
import tensorflow_federated as tff
source, _ = tff.simulation.datasets.emnist.load_data()
def map_fn(example):
return collections.OrderedDict(
x=tf.reshape(example['pixels'], [-1, 784]), y=example['label'])
def client_data(n):
ds = source.create_tf_dataset_for_client(source.client_ids[n])
return ds.repeat(10).batch(20).map(map_fn)
train_data = [client_data(n) for n in range(10)]
input_spec = train_data[0].element_spec
def model_fn():
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=(784,)),
tf.keras.layers.Dense(units=10, kernel_initializer='zeros'),
tf.keras.layers.Softmax(),
])
return tff.learning.from_keras_model(
model,
input_spec=input_spec,
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
trainer = tff.learning.build_federated_averaging_process(
model_fn, client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.02))
def evaluate(num_rounds=10):
state = trainer.initialize()
for round in range(num_rounds):
t1 = time.time()
state, metrics = trainer.next(state, train_data)
t2 = time.time()
print('Round {}: loss {}, round time {}'.format(round, metrics.loss, t2 - t1))
```
%% Cell type:markdown id: tags:
## Set Up the Remote Executors
By default, TFF executes all computations locally. In this step we tell TFF to connect to the Kubernetes services we set up above. Be sure to copy the IP address of your service here.
%% Cell type:code id: tags:
```
import grpc
ip_address = '0.0.0.0' #@param {type:"string"}
port = 80 #@param {type:"integer"}
channels = [grpc.insecure_channel(f'{ip_address}:{port}') for _ in range(10)]
tff.backends.native.set_remote_execution_context(channels)
```
%% Cell type:markdown id: tags:
## Run Training
%% Cell type:code id: tags:
```
evaluate()
```
%% Output
Round 0: loss 4.370407581329346, round time 4.201097726821899
Round 1: loss 4.1407670974731445, round time 3.3283166885375977
Round 2: loss 3.865147590637207, round time 3.098310947418213
Round 3: loss 3.534019708633423, round time 3.1565616130828857
Round 4: loss 3.272688388824463, round time 3.175067663192749
Round 5: loss 2.935391664505005, round time 3.008434534072876
Round 6: loss 2.7399251461029053, round time 3.31435227394104
Round 7: loss 2.5054931640625, round time 3.4411356449127197
Round 8: loss 2.290508985519409, round time 3.158798933029175
Round 9: loss 2.1194536685943604, round time 3.1348156929016113
......
支持 Markdown
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册