draft of doc

thuml · Nov 27, 2023 · 761f840 · 761f840
1 parent 8cefe1a
commit 761f840
Showing 1 changed file with 11 additions and 2 deletions.
diff --git a/docs/walk_through.rst b/docs/walk_through.rst
@@ -349,6 +349,13 @@ AOTAutograd does the above optimization automatically. In essense, it dynamicall
 
 This way, the saved tensors are made explicit, and the ``optimized_function`` accepts exactly the same inputs as the original function, while the producing exactly the same output as the original function and having exactly the same backward behavior as the original function.
 
+By varying the amount of ``saved_tensors``, we can:
+
+- Save more tensors for backward, so that backward computation is less heavy.
+- Save less tensors for backward, so that the memory footprint of forward is less heavy.
+
+Usually people goes the second way, i.e., saving memory by having more computation in the backward pass. And AOTAutograd will automatically select the optimal way to save memory.
+
 That is basically how AOT Autograd works!
 
 Backend: compile and optimize computation graph
@@ -362,9 +369,13 @@ In general, a backend will try every optimize techniques it knows for the comput
 
 In addition, no optimization is also a possible optimization. This is called ``eager`` backend in PyTorch.
 
+In a strict sense, the ``backend`` option in ``torch.compile`` affects whether backward computation graph exists and how the computation graphs are optimized. In practice, custom backends usually work with ``AOTAutograd`` to obtain backward computation graphs, and they only need to deal with computation graph optimization, no matter it is forward graph or backward graph.
+
 Summary
 --------------------------------------------------
 
+The following table shows the difference among several ``backend`` option in ``torch.compile``. If we want to adapt our code to ``torch.compile``, it is recommended to try ``backend="eager"`` first to see how our code is transformed into computation graph, and then to try ``backend="aot_eager"`` to see if we are satisfied with the backward graph, and finally try ``backend="inductor"`` to see if we can get any performance benefit.
+
 .. list-table:: Summary of backends
    :header-rows: 1
 
@@ -388,5 +399,3 @@ Summary
      - captured by ``Dynamo``
      - generated by ``AOTAutograd``
      - optimized by custom implementations
-
-In a strict sense, the ``backend`` option in ``torch.compile`` affects whether backward computation graph exists and how the computation graphs are optimized. In practice, custom backends usually work with ``AOTAutograd`` to obtain backward computation graphs, and they only need to deal with computation graph optimization, no matter it is forward graph or backward graph.