Skip to content

Commit

Permalink
Merge branch 'feature/guides' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
tenkeyless committed Dec 13, 2024
2 parents 767419d + 2a11aa4 commit a7e8cea
Show file tree
Hide file tree
Showing 56 changed files with 4,778 additions and 3,025 deletions.
8 changes: 4 additions & 4 deletions assets/css/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ body:is(html[class~="dark"] *) {
}

.dark .highlight {
/* Background .bg { color:#cdd6f4;background-color:#1e1e2e; }
/* PreWrapper .chroma { color:#cdd6f4;background-color:#1e1e2e; } */
/* Background */ .bg { color:#cdd6f4;background-color:#1e1e2e; }
/* PreWrapper */ .chroma { color:#cdd6f4;background-color:#1e1e2e; }
/* Other */ .chroma .x { }
/* Error */ .chroma .err { color:#f38ba8 }
/* CodeLine */ .chroma .cl { }
Expand Down Expand Up @@ -101,8 +101,8 @@ body:is(html[class~="dark"] *) {
}

.highlight {
/* Background .bg { color:#272822;background-color:#fafafa; }
/* PreWrapper .chroma { color:#272822;background-color:#fafafa; }
/* Background */ .bg { color:#272822;background-color:#f3f3f3; }
/* PreWrapper */ .chroma { color:#272822;background-color:#f3f3f3; }
/* Other .chroma .x { } */
/* Error */ .chroma .err { color:#960050;background-color:#1e0010 }
/* CodeLine .chroma .cl { } */
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,9 @@ In practice, the process of synchronously updating the weights of the model repl
To do single-host, multi-device synchronous training with a Keras model, you would use the `jax.sharding` features. Here's how it works:

- We first create a device mesh using `mesh_utils.create_device_mesh`.
- We use `jax.sharding.Mesh`, `jax.sharding.NamedSharding` and `jax.sharding.PartitionSpec` to define how to partition JAX arrays. - We specify that we want to replicate the model and optimizer variables across all devices by using a spec with no axis. - We specify that we want to shard the data across devices by using a spec that splits along the batch dimension.
- We use `jax.sharding.Mesh`, `jax.sharding.NamedSharding` and `jax.sharding.PartitionSpec` to define how to partition JAX arrays.
- We specify that we want to replicate the model and optimizer variables across all devices by using a spec with no axis.
- We specify that we want to shard the data across devices by using a spec that splits along the batch dimension.
- We use `jax.device_put` to replicate the model and optimizer variables across devices. This happens once at the beginning.
- In the training loop, for each batch that we process, we use `jax.device_put` to split the batch across devices before invoking the train step.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -231,7 +231,7 @@ Restoring from ./ckpt/ckpt-1.keras

{{% /details %}}

## [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) performance tips
## [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) performance tips {#tfdata}

When doing distributed training, the efficiency with which you load data can often become critical. Here are a few tips to make sure your [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) pipelines run as fast as possible.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,11 @@ In practice, the process of synchronously updating the weights of the model repl
To do single-host, multi-device synchronous training with a Keras model, you would use the `torch.nn.parallel.DistributedDataParallel` module wrapper. Here's how it works:

- We use `torch.multiprocessing.start_processes` to start multiple Python processes, one per device. Each process will run the `per_device_launch_fn` function.
- The `per_device_launch_fn` function does the following: - It uses `torch.distributed.init_process_group` and `torch.cuda.set_device` to configure the device to be used for that process. - It uses `torch.utils.data.distributed.DistributedSampler` and `torch.utils.data.DataLoader` to turn our data into a distributed data loader. - It also uses `torch.nn.parallel.DistributedDataParallel` to turn our model into a distributed PyTorch module. - It then calls the `train_model` function.
- The `per_device_launch_fn` function does the following:
- It uses `torch.distributed.init_process_group` and `torch.cuda.set_device` to configure the device to be used for that process.
- It uses `torch.utils.data.distributed.DistributedSampler` and `torch.utils.data.DataLoader` to turn our data into a distributed data loader.
- It also uses `torch.nn.parallel.DistributedDataParallel` to turn our model into a distributed PyTorch module.
- It then calls the `train_model` function.
- The `train_model` function will then run in each process, with the model using a separate device in each process.

Here's the flow, where each step is split into its own utility function:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ type: docs

{{< keras/original checkedAt="2024-11-19" >}}

**Authors:** [fchollet](https://twitter.com/fchollet), [lukewood](https://twitter.com/luke_wood_ml), [divamgupta](https://github.com/divamgupta)
**{{< t f_author >}}** [fchollet](https://twitter.com/fchollet), [lukewood](https://twitter.com/luke_wood_ml), [divamgupta](https://github.com/divamgupta)
**{{< t f_date_created >}}** 2022/09/25
**{{< t f_last_modified >}}** 2022/09/25
**{{< t f_description >}}** Generate new images using KerasCV's Stable Diffusion model.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ type: docs

{{< keras/original checkedAt="2024-11-18" >}}

**Authors:** Tom O'Malley, Haifeng Jin
**{{< t f_author >}}** Tom O'Malley, Haifeng Jin
**{{< t f_date_created >}}** 2019/10/28
**{{< t f_last_modified >}}** 2022/01/12
**{{< t f_description >}}** Use `HyperModel.fit()` to tune training hyperparameters (such as batch size).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ type: docs

{{< keras/original checkedAt="2024-11-18" >}}

**Authors:** Tom O'Malley, Haifeng Jin
**{{< t f_author >}}** Tom O'Malley, Haifeng Jin
**{{< t f_date_created >}}** 2019/10/24
**{{< t f_last_modified >}}** 2021/06/02
**{{< t f_description >}}** Tuning the hyperparameters of the models with multiple GPUs and multiple machines.
Expand Down Expand Up @@ -60,7 +60,7 @@ export KERASTUNER_ORACLE_PORT="8000"
python run_tuning.py
```

### Data parallelism with [`tf.distribute`](https://www.tensorflow.org/api_docs/python/tf/distribute)
### Data parallelism with [`tf.distribute`](https://www.tensorflow.org/api_docs/python/tf/distribute) {#tfdistribute}

KerasTuner also supports data parallelism via [tf.distribute](https://www.tensorflow.org/tutorials/distribute/keras). Data parallelism and distributed tuning can be combined. For example, if you have 10 workers with 4 GPUs on each worker, you can run 10 parallel trials with each trial training on 4 GPUs by using [tf.distribute.MirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy). You can also run each trial on TPUs via [tf.distribute.TPUStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/TPUStrategy). Currently [tf.distribute.MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy) is not supported, but support for this is on the roadmap.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ type: docs

{{< keras/original checkedAt="2024-11-19" >}}

**Authors:** Haifeng Jin
**{{< t f_author >}}** Haifeng Jin
**{{< t f_date_created >}}** 2023/02/28
**{{< t f_last_modified >}}** 2023/02/28
**{{< t f_description >}}** The basics of fault tolerance configurations in KerasTuner.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ type: docs

{{< keras/original checkedAt="2024-11-19" >}}

**Authors:** Luca Invernizzi, James Long, Francois Chollet, Tom O'Malley, Haifeng Jin
**{{< t f_author >}}** Luca Invernizzi, James Long, Francois Chollet, Tom O'Malley, Haifeng Jin
**{{< t f_date_created >}}** 2019/05/31
**{{< t f_last_modified >}}** 2021/10/27
**{{< t f_description >}}** Tune a subset of the hyperparameters without changing the hypermodel.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ All layers you've seen so far in this guide work with all Keras backends.
The `keras.ops` namespace gives you access to:

- The NumPy API, e.g. `ops.matmul`, `ops.sum`, `ops.reshape`, `ops.stack`, etc.
- Neural networks-specific APIs such as `ops.softmax`, `ops`.conv`,`ops.binary_crossentropy`,`ops.relu\`, etc.
- Neural networks-specific APIs such as `ops.softmax`, `ops.conv`, `ops.binary_crossentropy`, `ops.relu`, etc.

You can also use backend-native APIs in your layers (such as [`tf.nn`](https://www.tensorflow.org/api_docs/python/tf/nn) functions), but if you do this, then your layer will only be usable with the backend in question. For instance, you could write the following JAX-specific layer using `jax.numpy`:

Expand Down
7 changes: 6 additions & 1 deletion content/english/docs/guides/migrating_to_keras_3/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -751,7 +751,12 @@ Your models may include a custom `train_step()` or `test_step()` method, which r

In some cases, you might be able to simply override the `Model.compute_loss()` method and make it fully backend-agnostic, instead of overriding `train_step()`. Here's an example of a layer with a custom `compute_loss()` method which works across JAX, TensorFlow, and PyTorch:

`class MyModel(keras.Model): def compute_loss(self, x=None, y=None, y_pred=None, sample_weight=None): loss = keras.ops.sum(keras.losses.mean_squared_error(y, y_pred, sample_weight)) return loss`
```python
class MyModel(keras.Model):
def compute_loss(self, x=None, y=None, y_pred=None, sample_weight=None):
loss = keras.ops.sum(keras.losses.mean_squared_error(y, y_pred, sample_weight))
return loss
```

If you need to modify the optimization mechanism itself, beyond the loss computation, then you will need to override `train_step()`, and implement one `train_step` method per backend, like below.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ type: docs

{{< keras/original checkedAt="2024-11-18" >}}

**Authors:** Neel Kovelamudi, Francois Chollet
**{{< t f_author >}}** Neel Kovelamudi, Francois Chollet
**{{< t f_date_created >}}** 2023/06/14
**{{< t f_last_modified >}}** 2023/06/30
**{{< t f_description >}}** Complete guide to saving, serializing, and exporting models.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -502,7 +502,7 @@ model.fit(x_train, y_train, batch_size=64, validation_split=0.2, epochs=1)

{{% /details %}}

## Training & evaluation using [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) Datasets
## Training & evaluation using [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) Datasets {#tfdata}

In the past few paragraphs, you've seen how to handle losses, metrics, and optimizers, and you've seen how to use the `validation_data` and `validation_split` arguments in `fit()`, when your data is passed as NumPy arrays.

Expand Down
2 changes: 0 additions & 2 deletions content/english/docs/guides/transfer_learning/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,6 @@ non_trainable_weights: 0

{{% /details %}}

## Available guides

In general, all weights are trainable weights. The only built-in layer that has non-trainable weights is the `BatchNormalization` layer. It uses non-trainable weights to keep track of the mean and variance of its inputs during training. To learn how to use non-trainable weights in your own custom layers, see the [guide to writing new layers from scratch]({{< relref "/docs/guides/making_new_layers_and_models_via_subclassing" >}}).

**Example: the `BatchNormalization` layer has 2 trainable weights and 2 non-trainable weights**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -399,7 +399,7 @@ Time taken: 43.49s

{{% /details %}}

## Speeding-up your training step with [`tf.function`](https://www.tensorflow.org/api_docs/python/tf/function)
## Speeding-up your training step with [`tf.function`](https://www.tensorflow.org/api_docs/python/tf/function) {#tffunction}

The default runtime in TensorFlow is eager execution. As such, our training loop above executes eagerly.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ type: docs

{{< keras/original checkedAt="2024-11-18" >}}

**Authors:** Rick Chao, Francois Chollet
**{{< t f_author >}}** Rick Chao, Francois Chollet
**{{< t f_date_created >}}** 2019/03/20
**{{< t f_last_modified >}}** 2023/06/25
**{{< t f_description >}}** Complete guide to writing new Keras callbacks.
Expand Down
Loading

0 comments on commit a7e8cea

Please sign in to comment.