Skip to content

Commit

Permalink
update link
Browse files Browse the repository at this point in the history
  • Loading branch information
youkaichao committed Nov 27, 2023
1 parent 051aeda commit e9f767a
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
16 changes: 16 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,22 @@ Welcome to the documentation of ``depyf``

Before learning the usage of ``depyf``, we recommend reading the :doc:`walk_through` example of ``torch.compile``, so that you can understand how ``depyf`` would help you.

``depyf`` aims to address two pain points of ``torch.compile``:

- ``torch.compile`` transforms Python bytecode, but very few developers can understand Python bytecode. ``depyf`` helps to decompile the transformed bytecode back into Python source code, so that developers can understand how ``torch.compile`` transforms their code. This greatly helps users to adapt their code to ``torch.compile``, so that they can write code friendly to ``torch.compile``.
- Many functions in ``torch.compile`` are dynamically generated, which can only be run as a black box. ``depyf`` helps to dump the source code to files, and to link these functions with the source code files, so that users can use debuggers to step through these functions. This greatly helps users to understand ``torch.compile`` and debug issues like ``NaN`` during training.

Take the workflow from the walk-through example:

.. image:: _static/images/dynamo-workflow.png
:width: 1200
:alt: Dynamo workflow

``depyf`` helps to:

- Generate source code for transformed bytecode and resume functions.
-

.. toctree::
:maxdepth: 1

Expand Down
2 changes: 1 addition & 1 deletion docs/walk_through.rst
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ By varying the amount of ``saved_tensors``, we can:
- Save more tensors for backward, so that backward computation is less heavy.
- Save less tensors for backward, so that the memory footprint of forward is less heavy.

Usually people goes the second way, i.e., saving memory by having more computation in the backward pass. And AOTAutograd will automatically select the optimal way to save memory.
Usually people goes the second way, i.e., saving memory by having more computation in the backward pass. And AOTAutograd will automatically select the optimal way to save memory. To be specific, it uses a `max flow mini cut <https://en.wikipedia.org/wiki/Minimum_cut>`_ algorithm to cut the joint graph into a forward graph and a backward graph. More discussions can be found `at this thread <https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467>`_.

That is basically how AOT Autograd works!

Expand Down

0 comments on commit e9f767a

Please sign in to comment.