From e9f767a98acd2000b68c9a0cff73d78c606f96a2 Mon Sep 17 00:00:00 2001 From: youkaichao Date: Mon, 27 Nov 2023 14:53:58 +0800 Subject: [PATCH] update link --- docs/index.rst | 16 ++++++++++++++++ docs/walk_through.rst | 2 +- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/docs/index.rst b/docs/index.rst index 6bdcfb17..cff968a2 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,6 +3,22 @@ Welcome to the documentation of ``depyf`` Before learning the usage of ``depyf``, we recommend reading the :doc:`walk_through` example of ``torch.compile``, so that you can understand how ``depyf`` would help you. +``depyf`` aims to address two pain points of ``torch.compile``: + +- ``torch.compile`` transforms Python bytecode, but very few developers can understand Python bytecode. ``depyf`` helps to decompile the transformed bytecode back into Python source code, so that developers can understand how ``torch.compile`` transforms their code. This greatly helps users to adapt their code to ``torch.compile``, so that they can write code friendly to ``torch.compile``. +- Many functions in ``torch.compile`` are dynamically generated, which can only be run as a black box. ``depyf`` helps to dump the source code to files, and to link these functions with the source code files, so that users can use debuggers to step through these functions. This greatly helps users to understand ``torch.compile`` and debug issues like ``NaN`` during training. + +Take the workflow from the walk-through example: + +.. image:: _static/images/dynamo-workflow.png + :width: 1200 + :alt: Dynamo workflow + +``depyf`` helps to: + +- Generate source code for transformed bytecode and resume functions. +- + .. toctree:: :maxdepth: 1 diff --git a/docs/walk_through.rst b/docs/walk_through.rst index b173cdca..73711b04 100644 --- a/docs/walk_through.rst +++ b/docs/walk_through.rst @@ -354,7 +354,7 @@ By varying the amount of ``saved_tensors``, we can: - Save more tensors for backward, so that backward computation is less heavy. - Save less tensors for backward, so that the memory footprint of forward is less heavy. -Usually people goes the second way, i.e., saving memory by having more computation in the backward pass. And AOTAutograd will automatically select the optimal way to save memory. +Usually people goes the second way, i.e., saving memory by having more computation in the backward pass. And AOTAutograd will automatically select the optimal way to save memory. To be specific, it uses a `max flow mini cut `_ algorithm to cut the joint graph into a forward graph and a backward graph. More discussions can be found `at this thread `_. That is basically how AOT Autograd works!