Merge branch 'release/1.1'

tenkeyless · Dec 13, 2024 · a49ba4a · a49ba4a
2 parents 767419d + a7e8cea
commit a49ba4a
Show file tree

Hide file tree

Showing 56 changed files with 4,778 additions and 3,025 deletions.
diff --git a/assets/css/custom.css b/assets/css/custom.css
@@ -12,8 +12,8 @@ body:is(html[class~="dark"] *) {
 }
 
 .dark .highlight {
-    /* Background .bg { color:#cdd6f4;background-color:#1e1e2e; }
-    /* PreWrapper .chroma { color:#cdd6f4;background-color:#1e1e2e; } */
+    /* Background */ .bg { color:#cdd6f4;background-color:#1e1e2e; }
+    /* PreWrapper */ .chroma { color:#cdd6f4;background-color:#1e1e2e; }
     /* Other */ .chroma .x {  }
     /* Error */ .chroma .err { color:#f38ba8 }
     /* CodeLine */ .chroma .cl {  }
@@ -101,8 +101,8 @@ body:is(html[class~="dark"] *) {
 }
 
 .highlight {
-    /* Background  .bg { color:#272822;background-color:#fafafa; }
-    /* PreWrapper  .chroma { color:#272822;background-color:#fafafa; }
+    /* Background */ .bg { color:#272822;background-color:#f3f3f3; }
+    /* PreWrapper */ .chroma { color:#272822;background-color:#f3f3f3; }
     /* Other  .chroma .x {  } */
     /* Error */ .chroma .err { color:#960050;background-color:#1e0010 }
     /* CodeLine  .chroma .cl {  } */

diff --git a/content/english/docs/guides/distributed_training_with_jax/_index.md b/content/english/docs/guides/distributed_training_with_jax/_index.md
@@ -123,7 +123,9 @@ In practice, the process of synchronously updating the weights of the model repl
 To do single-host, multi-device synchronous training with a Keras model, you would use the `jax.sharding` features. Here's how it works:
 
 - We first create a device mesh using `mesh_utils.create_device_mesh`.
-- We use `jax.sharding.Mesh`, `jax.sharding.NamedSharding` and `jax.sharding.PartitionSpec` to define how to partition JAX arrays. - We specify that we want to replicate the model and optimizer variables across all devices by using a spec with no axis. - We specify that we want to shard the data across devices by using a spec that splits along the batch dimension.
+- We use `jax.sharding.Mesh`, `jax.sharding.NamedSharding` and `jax.sharding.PartitionSpec` to define how to partition JAX arrays.
+  - We specify that we want to replicate the model and optimizer variables across all devices by using a spec with no axis.
+  - We specify that we want to shard the data across devices by using a spec that splits along the batch dimension.
 - We use `jax.device_put` to replicate the model and optimizer variables across devices. This happens once at the beginning.
 - In the training loop, for each batch that we process, we use `jax.device_put` to split the batch across devices before invoking the train step.
 

diff --git a/content/english/docs/guides/distributed_training_with_tensorflow/_index.md b/content/english/docs/guides/distributed_training_with_tensorflow/_index.md
@@ -231,7 +231,7 @@ Restoring from ./ckpt/ckpt-1.keras
 
 {{% /details %}}
 
-## [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) performance tips
+## [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) performance tips {#tfdata}
 
 When doing distributed training, the efficiency with which you load data can often become critical. Here are a few tips to make sure your [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) pipelines run as fast as possible.
 

diff --git a/content/english/docs/guides/distributed_training_with_torch/_index.md b/content/english/docs/guides/distributed_training_with_torch/_index.md
@@ -146,7 +146,11 @@ In practice, the process of synchronously updating the weights of the model repl
 To do single-host, multi-device synchronous training with a Keras model, you would use the `torch.nn.parallel.DistributedDataParallel` module wrapper. Here's how it works:
 
 - We use `torch.multiprocessing.start_processes` to start multiple Python processes, one per device. Each process will run the `per_device_launch_fn` function.
-- The `per_device_launch_fn` function does the following: - It uses `torch.distributed.init_process_group` and `torch.cuda.set_device` to configure the device to be used for that process. - It uses `torch.utils.data.distributed.DistributedSampler` and `torch.utils.data.DataLoader` to turn our data into a distributed data loader. - It also uses `torch.nn.parallel.DistributedDataParallel` to turn our model into a distributed PyTorch module. - It then calls the `train_model` function.
+- The `per_device_launch_fn` function does the following:
+  - It uses `torch.distributed.init_process_group` and `torch.cuda.set_device` to configure the device to be used for that process.
+  - It uses `torch.utils.data.distributed.DistributedSampler` and `torch.utils.data.DataLoader` to turn our data into a distributed data loader.
+  - It also uses `torch.nn.parallel.DistributedDataParallel` to turn our model into a distributed PyTorch module.
+  - It then calls the `train_model` function.
 - The `train_model` function will then run in each process, with the model using a separate device in each process.
 
 Here's the flow, where each step is split into its own utility function:

diff --git a/...nt/english/docs/guides/keras_cv/generate_images_with_stable_diffusion/_index.md b/...nt/english/docs/guides/keras_cv/generate_images_with_stable_diffusion/_index.md
@@ -8,7 +8,7 @@ type: docs
 
 {{< keras/original checkedAt="2024-11-19" >}}
 
-**Authors:** [fchollet](https://twitter.com/fchollet), [lukewood](https://twitter.com/luke_wood_ml), [divamgupta](https://github.com/divamgupta)  
+**{{< t f_author >}}** [fchollet](https://twitter.com/fchollet), [lukewood](https://twitter.com/luke_wood_ml), [divamgupta](https://github.com/divamgupta)  
 **{{< t f_date_created >}}** 2022/09/25  
 **{{< t f_last_modified >}}** 2022/09/25  
 **{{< t f_description >}}** Generate new images using KerasCV's Stable Diffusion model.

diff --git a/content/english/docs/guides/keras_tuner/custom_tuner/_index.md b/content/english/docs/guides/keras_tuner/custom_tuner/_index.md
@@ -8,7 +8,7 @@ type: docs
 
 {{< keras/original checkedAt="2024-11-18" >}}
 
-**Authors:** Tom O'Malley, Haifeng Jin  
+**{{< t f_author >}}** Tom O'Malley, Haifeng Jin  
 **{{< t f_date_created >}}** 2019/10/28  
 **{{< t f_last_modified >}}** 2022/01/12  
 **{{< t f_description >}}** Use `HyperModel.fit()` to tune training hyperparameters (such as batch size).

diff --git a/content/english/docs/guides/keras_tuner/distributed_tuning/_index.md b/content/english/docs/guides/keras_tuner/distributed_tuning/_index.md
@@ -8,7 +8,7 @@ type: docs
 
 {{< keras/original checkedAt="2024-11-18" >}}
 
-**Authors:** Tom O'Malley, Haifeng Jin  
+**{{< t f_author >}}** Tom O'Malley, Haifeng Jin  
 **{{< t f_date_created >}}** 2019/10/24  
 **{{< t f_last_modified >}}** 2021/06/02  
 **{{< t f_description >}}** Tuning the hyperparameters of the models with multiple GPUs and multiple machines.
@@ -60,7 +60,7 @@ export KERASTUNER_ORACLE_PORT="8000"
 python run_tuning.py
 ```
 
-### Data parallelism with [`tf.distribute`](https://www.tensorflow.org/api_docs/python/tf/distribute)
+### Data parallelism with [`tf.distribute`](https://www.tensorflow.org/api_docs/python/tf/distribute) {#tfdistribute}
 
 KerasTuner also supports data parallelism via [tf.distribute](https://www.tensorflow.org/tutorials/distribute/keras). Data parallelism and distributed tuning can be combined. For example, if you have 10 workers with 4 GPUs on each worker, you can run 10 parallel trials with each trial training on 4 GPUs by using [tf.distribute.MirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy). You can also run each trial on TPUs via [tf.distribute.TPUStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/TPUStrategy). Currently [tf.distribute.MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy) is not supported, but support for this is on the roadmap.
 

diff --git a/content/english/docs/guides/keras_tuner/failed_trials/_index.md b/content/english/docs/guides/keras_tuner/failed_trials/_index.md
@@ -8,7 +8,7 @@ type: docs
 
 {{< keras/original checkedAt="2024-11-19" >}}
 
-**Authors:** Haifeng Jin  
+**{{< t f_author >}}** Haifeng Jin  
 **{{< t f_date_created >}}** 2023/02/28  
 **{{< t f_last_modified >}}** 2023/02/28  
 **{{< t f_description >}}** The basics of fault tolerance configurations in KerasTuner.

diff --git a/content/english/docs/guides/keras_tuner/tailor_the_search_space/_index.md b/content/english/docs/guides/keras_tuner/tailor_the_search_space/_index.md
@@ -8,7 +8,7 @@ type: docs
 
 {{< keras/original checkedAt="2024-11-19" >}}
 
-**Authors:** Luca Invernizzi, James Long, Francois Chollet, Tom O'Malley, Haifeng Jin  
+**{{< t f_author >}}** Luca Invernizzi, James Long, Francois Chollet, Tom O'Malley, Haifeng Jin  
 **{{< t f_date_created >}}** 2019/05/31  
 **{{< t f_last_modified >}}** 2021/10/27  
 **{{< t f_description >}}** Tune a subset of the hyperparameters without changing the hypermodel.

diff --git a/content/english/docs/guides/making_new_layers_and_models_via_subclassing/_index.md b/content/english/docs/guides/making_new_layers_and_models_via_subclassing/_index.md
@@ -244,7 +244,7 @@ All layers you've seen so far in this guide work with all Keras backends.
 The `keras.ops` namespace gives you access to:
 
 - The NumPy API, e.g. `ops.matmul`, `ops.sum`, `ops.reshape`, `ops.stack`, etc.
-- Neural networks-specific APIs such as `ops.softmax`, `ops`.conv`,`ops.binary_crossentropy`,`ops.relu\`, etc.
+- Neural networks-specific APIs such as `ops.softmax`, `ops.conv`, `ops.binary_crossentropy`, `ops.relu`, etc.
 
 You can also use backend-native APIs in your layers (such as [`tf.nn`](https://www.tensorflow.org/api_docs/python/tf/nn) functions), but if you do this, then your layer will only be usable with the backend in question. For instance, you could write the following JAX-specific layer using `jax.numpy`:
 

diff --git a/content/english/docs/guides/migrating_to_keras_3/_index.md b/content/english/docs/guides/migrating_to_keras_3/_index.md
@@ -751,7 +751,12 @@ Your models may include a custom `train_step()` or `test_step()` method, which r
 
 In some cases, you might be able to simply override the `Model.compute_loss()` method and make it fully backend-agnostic, instead of overriding `train_step()`. Here's an example of a layer with a custom `compute_loss()` method which works across JAX, TensorFlow, and PyTorch:
 
-`class MyModel(keras.Model):     def compute_loss(self, x=None, y=None, y_pred=None, sample_weight=None):         loss = keras.ops.sum(keras.losses.mean_squared_error(y, y_pred, sample_weight))         return loss`
+```python
+class MyModel(keras.Model):
+    def compute_loss(self, x=None, y=None, y_pred=None, sample_weight=None):
+        loss = keras.ops.sum(keras.losses.mean_squared_error(y, y_pred, sample_weight))
+        return loss
+```
 
 If you need to modify the optimization mechanism itself, beyond the loss computation, then you will need to override `train_step()`, and implement one `train_step` method per backend, like below.
 

diff --git a/content/english/docs/guides/serialization_and_saving/_index.md b/content/english/docs/guides/serialization_and_saving/_index.md
@@ -8,7 +8,7 @@ type: docs
 
 {{< keras/original checkedAt="2024-11-18" >}}
 
-**Authors:** Neel Kovelamudi, Francois Chollet  
+**{{< t f_author >}}** Neel Kovelamudi, Francois Chollet  
 **{{< t f_date_created >}}** 2023/06/14  
 **{{< t f_last_modified >}}** 2023/06/30  
 **{{< t f_description >}}** Complete guide to saving, serializing, and exporting models.

diff --git a/content/english/docs/guides/training_with_built_in_methods/_index.md b/content/english/docs/guides/training_with_built_in_methods/_index.md
@@ -502,7 +502,7 @@ model.fit(x_train, y_train, batch_size=64, validation_split=0.2, epochs=1)
 
 {{% /details %}}
 
-## Training & evaluation using [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) Datasets
+## Training & evaluation using [`tf.data`](https://www.tensorflow.org/api_docs/python/tf/data) Datasets {#tfdata}
 
 In the past few paragraphs, you've seen how to handle losses, metrics, and optimizers, and you've seen how to use the `validation_data` and `validation_split` arguments in `fit()`, when your data is passed as NumPy arrays.
 

diff --git a/content/english/docs/guides/transfer_learning/_index.md b/content/english/docs/guides/transfer_learning/_index.md
@@ -78,8 +78,6 @@ non_trainable_weights: 0
 
 {{% /details %}}
 
-## Available guides
-
 In general, all weights are trainable weights. The only built-in layer that has non-trainable weights is the `BatchNormalization` layer. It uses non-trainable weights to keep track of the mean and variance of its inputs during training. To learn how to use non-trainable weights in your own custom layers, see the [guide to writing new layers from scratch]({{< relref "/docs/guides/making_new_layers_and_models_via_subclassing" >}}).
 
 **Example: the `BatchNormalization` layer has 2 trainable weights and 2 non-trainable weights**

diff --git a/content/english/docs/guides/writing_a_custom_training_loop_in_tensorflow/_index.md b/content/english/docs/guides/writing_a_custom_training_loop_in_tensorflow/_index.md
@@ -399,7 +399,7 @@ Time taken: 43.49s
 
 {{% /details %}}
 
-## Speeding-up your training step with [`tf.function`](https://www.tensorflow.org/api_docs/python/tf/function)
+## Speeding-up your training step with [`tf.function`](https://www.tensorflow.org/api_docs/python/tf/function) {#tffunction}
 
 The default runtime in TensorFlow is eager execution. As such, our training loop above executes eagerly.
 

diff --git a/content/english/docs/guides/writing_your_own_callbacks/_index.md b/content/english/docs/guides/writing_your_own_callbacks/_index.md
@@ -8,7 +8,7 @@ type: docs
 
 {{< keras/original checkedAt="2024-11-18" >}}
 
-**Authors:** Rick Chao, Francois Chollet  
+**{{< t f_author >}}** Rick Chao, Francois Chollet  
 **{{< t f_date_created >}}** 2019/03/20  
 **{{< t f_last_modified >}}** 2023/06/25  
 **{{< t f_description >}}** Complete guide to writing new Keras callbacks.