Skip to content

Commit

Permalink
Clarify self.log(..., rank_zero_only=True|False) (#19056)
Browse files Browse the repository at this point in the history
Co-authored-by: Carlos Mocholí <[email protected]>
(cherry picked from commit 9004379)
  • Loading branch information
awaelchli authored and Borda committed Dec 19, 2023
1 parent f9f53b5 commit ce30fdb
Show file tree
Hide file tree
Showing 4 changed files with 29 additions and 10 deletions.
6 changes: 4 additions & 2 deletions docs/source-pytorch/accelerators/accelerator_prepare.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,14 +121,16 @@ It is possible to perform some computation manually and log the reduced result o
mean = torch.mean(self.all_gather(self.outputs))
self.outputs.clear() # free memory
# When logging only on rank 0, don't forget to add
# When you call `self.log` only on rank 0, don't forget to add
# `rank_zero_only=True` to avoid deadlocks on synchronization.
# caveat: monitoring this is unimplemented. see https://github.com/Lightning-AI/lightning/issues/15852
# Caveat: monitoring this is unimplemented, see https://github.com/Lightning-AI/lightning/issues/15852
if self.trainer.is_global_zero:
self.log("my_reduced_metric", mean, rank_zero_only=True)
----


**********************
Make models pickleable
**********************
Expand Down
2 changes: 1 addition & 1 deletion docs/source-pytorch/extensions/logging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ The :meth:`~lightning.pytorch.core.LightningModule.log` method has a few options
* ``sync_dist_group``: The DDP group to sync across.
* ``add_dataloader_idx``: If True, appends the index of the current dataloader to the name (when using multiple dataloaders). If False, user needs to give unique names for each dataloader to not mix the values.
* ``batch_size``: Current batch size used for accumulating logs logged with ``on_epoch=True``. This will be directly inferred from the loaded batch, but for some data structures you might need to explicitly provide it.
* ``rank_zero_only``: Whether the value will be logged only on rank 0. This will prevent synchronization which would produce a deadlock as not all processes would perform this log call.
* ``rank_zero_only``: Set this to ``True`` only if you call ``self.log`` explicitly only from rank 0. If ``True`` you won't be able to access or specify this metric in callbacks (e.g. early stopping).

.. list-table:: Default behavior of logging in Callback or LightningModule
:widths: 50 25 25
Expand Down
19 changes: 16 additions & 3 deletions docs/source-pytorch/visualize/logging_advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,13 +196,26 @@ If set to True, logs will be sent to the progress bar.

rank_zero_only
==============
**Default:** True
**Default:** False

Tells Lightning if you are calling ``self.log`` from every process (default) or only from rank 0.
This is for advanced users who want to reduce their metric manually across processes, but still want to benefit from automatic logging via ``self.log``.

Whether the value will be logged only on rank 0. This will prevent synchronization which would produce a deadlock as not all processes would perform this log call.
- Set ``False`` (default) if you are calling ``self.log`` from every process.
- Set ``True`` if you are calling ``self.log`` from rank 0 only. Caveat: you won't be able to use this metric as a monitor in callbacks (e.g., early stopping).

.. code-block:: python
self.log(rank_zero_only=True)
# Default
self.log(..., rank_zero_only=False)
# If you call `self.log` on rank 0 only, you need to set `rank_zero_only=True`
if self.trainer.global_rank == 0:
self.log(..., rank_zero_only=True)
# DON'T do this, it will cause deadlocks!
self.log(..., rank_zero_only=True)
----

Expand Down
12 changes: 8 additions & 4 deletions src/lightning/pytorch/core/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -400,8 +400,10 @@ def log(
but for some data structures you might need to explicitly provide it.
metric_attribute: To restore the metric state, Lightning requires the reference of the
:class:`torchmetrics.Metric` in your model. This is found automatically if it is a model attribute.
rank_zero_only: Whether the value will be logged only on rank 0. This will prevent synchronization which
would produce a deadlock as not all processes would perform this log call.
rank_zero_only: Tells Lightning if you are calling ``self.log`` from every process (default) or only from
rank 0. If ``True``, you won't be able to use this metric as a monitor in callbacks
(e.g., early stopping). Warning: Improper use can lead to deadlocks! See
:ref:`Advanced Logging <visualize/logging_advanced:rank_zero_only>` for more details.
"""
if self._fabric is not None:
Expand Down Expand Up @@ -563,8 +565,10 @@ def log_dict(
each dataloader to not mix values.
batch_size: Current batch size. This will be directly inferred from the loaded batch,
but some data structures might need to explicitly provide it.
rank_zero_only: Whether the value will be logged only on rank 0. This will prevent synchronization which
would produce a deadlock as not all processes would perform this log call.
rank_zero_only: Tells Lightning if you are calling ``self.log`` from every process (default) or only from
rank 0. If ``True``, you won't be able to use this metric as a monitor in callbacks
(e.g., early stopping). Warning: Improper use can lead to deadlocks! See
:ref:`Advanced Logging <visualize/logging_advanced:rank_zero_only>` for more details.
"""
if self._fabric is not None:
Expand Down

0 comments on commit ce30fdb

Please sign in to comment.