Skip to content

Commit

Permalink
draft of doc
Browse files Browse the repository at this point in the history
  • Loading branch information
youkaichao committed Nov 27, 2023
1 parent 8cefe1a commit 761f840
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions docs/walk_through.rst
Original file line number Diff line number Diff line change
Expand Up @@ -349,6 +349,13 @@ AOTAutograd does the above optimization automatically. In essense, it dynamicall
This way, the saved tensors are made explicit, and the ``optimized_function`` accepts exactly the same inputs as the original function, while the producing exactly the same output as the original function and having exactly the same backward behavior as the original function.

By varying the amount of ``saved_tensors``, we can:

- Save more tensors for backward, so that backward computation is less heavy.
- Save less tensors for backward, so that the memory footprint of forward is less heavy.

Usually people goes the second way, i.e., saving memory by having more computation in the backward pass. And AOTAutograd will automatically select the optimal way to save memory.

That is basically how AOT Autograd works!

Backend: compile and optimize computation graph
Expand All @@ -362,9 +369,13 @@ In general, a backend will try every optimize techniques it knows for the comput

In addition, no optimization is also a possible optimization. This is called ``eager`` backend in PyTorch.

In a strict sense, the ``backend`` option in ``torch.compile`` affects whether backward computation graph exists and how the computation graphs are optimized. In practice, custom backends usually work with ``AOTAutograd`` to obtain backward computation graphs, and they only need to deal with computation graph optimization, no matter it is forward graph or backward graph.

Summary
--------------------------------------------------

The following table shows the difference among several ``backend`` option in ``torch.compile``. If we want to adapt our code to ``torch.compile``, it is recommended to try ``backend="eager"`` first to see how our code is transformed into computation graph, and then to try ``backend="aot_eager"`` to see if we are satisfied with the backward graph, and finally try ``backend="inductor"`` to see if we can get any performance benefit.

.. list-table:: Summary of backends
:header-rows: 1

Expand All @@ -388,5 +399,3 @@ Summary
- captured by ``Dynamo``
- generated by ``AOTAutograd``
- optimized by custom implementations

In a strict sense, the ``backend`` option in ``torch.compile`` affects whether backward computation graph exists and how the computation graphs are optimized. In practice, custom backends usually work with ``AOTAutograd`` to obtain backward computation graphs, and they only need to deal with computation graph optimization, no matter it is forward graph or backward graph.

0 comments on commit 761f840

Please sign in to comment.