update link

thuml · Nov 27, 2023 · e9f767a · e9f767a
1 parent 051aeda
commit e9f767a
Show file tree

Hide file tree

Showing 2 changed files with 17 additions and 1 deletion.
diff --git a/docs/index.rst b/docs/index.rst
@@ -3,6 +3,22 @@ Welcome to the documentation of ``depyf``
 
 Before learning the usage of ``depyf``, we recommend reading the :doc:`walk_through` example of ``torch.compile``, so that you can understand how ``depyf`` would help you.
 
+``depyf`` aims to address two pain points of ``torch.compile``:
+
+- ``torch.compile`` transforms Python bytecode, but very few developers can understand Python bytecode. ``depyf`` helps to decompile the transformed bytecode back into Python source code, so that developers can understand how ``torch.compile`` transforms their code. This greatly helps users to adapt their code to ``torch.compile``, so that they can write code friendly to ``torch.compile``.
+- Many functions in ``torch.compile`` are dynamically generated, which can only be run as a black box. ``depyf`` helps to dump the source code to files, and to link these functions with the source code files, so that users can use debuggers to step through these functions. This greatly helps users to understand ``torch.compile`` and debug issues like ``NaN`` during training.
+
+Take the workflow from the walk-through example:
+
+.. image:: _static/images/dynamo-workflow.png
+  :width: 1200
+  :alt: Dynamo workflow
+
+``depyf`` helps to:
+
+- Generate source code for transformed bytecode and resume functions.
+- 
+
 .. toctree::
    :maxdepth: 1
 

diff --git a/docs/walk_through.rst b/docs/walk_through.rst
@@ -354,7 +354,7 @@ By varying the amount of ``saved_tensors``, we can:
 - Save more tensors for backward, so that backward computation is less heavy.
 - Save less tensors for backward, so that the memory footprint of forward is less heavy.
 
-Usually people goes the second way, i.e., saving memory by having more computation in the backward pass. And AOTAutograd will automatically select the optimal way to save memory.
+Usually people goes the second way, i.e., saving memory by having more computation in the backward pass. And AOTAutograd will automatically select the optimal way to save memory. To be specific, it uses a `max flow mini cut <https://en.wikipedia.org/wiki/Minimum_cut>`_ algorithm to cut the joint graph into a forward graph and a backward graph. More discussions can be found `at this thread <https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467>`_.
 
 That is basically how AOT Autograd works!