update docs

TimSiebert1 · Jun 20, 2024 · 155946e · 155946e
1 parent 23dc21c
commit 155946e
Show file tree

Hide file tree

Showing 5 changed files with 50 additions and 21 deletions.
diff --git a/docs/make.jl b/docs/make.jl
@@ -1,13 +1,17 @@
-using Documenter 
+using Documenter, DocumenterCitations
 using ADOLC
 
+bib = CitationBibliography(
+    joinpath(@__DIR__, "src", "citations.bib");
+    style=:numeric
+)
 
 DocMeta.setdocmeta!(ADOLC, :DocTestSetup, :(using ADOLC); recursive=true)
 
 makedocs(;
     sitename="ADOLC.jl",
     authors="Tim Siebert",
-    #plugins = [bib], 
+    plugins = [bib], 
     repo="github.com/TimSiebert1/ADOLC.jl",
     modules=[ADOLC],
     format=Documenter.HTML(),

diff --git a/docs/src/citations.bib b/docs/src/citations.bib
@@ -5,3 +5,11 @@ @article{griewank_evaluating_1999
 	author = {Griewank, Andreas and Utke, Jean and Walther, Andrea},
 	year = {1999},
 }
+
+
+@article{Gr13,
+	author = {A. Griewank},
+	year = {2013},
+	journal = {Optimization Methods and Software},
+	title = {On stable piecewise linearization and generalized algorithmic differentiation},
+	volume = {28}}
diff --git a/docs/src/lib/derivative_modes.md b/docs/src/lib/derivative_modes.md
@@ -1,5 +1,6 @@
 # Derivative Modes
 
+The following sections provide some details about the available `mode`s and possibilities for the computation of derivatives with [`derivative`](@ref) or [`derivative!`](@ref).
 
 
 ## First-Order
@@ -39,11 +40,14 @@ Each `mode`'s formula symbolizes the underlying computation. A user can read-off
 $$\dot{v} \in \mathbb{R}^{n}$$, $$\dot{V} \in \mathbb{R}^{n \times p}$$, $$\bar{z}  \in \mathbb{R}^{m}$$ and $$\bar{Z} \in \mathbb{R}^{m \times q}$$.
 
 ## Abs-Normal-Form
-| Mode             | Formula                       |
-|:------------------:|:-------------------------------:|
-| `:abs_norm`           | $$\Delta f(x)$$               |
 
+There is one `mode` to compute the `abs-normal-form` of a function using [`derivative`](@ref) or [`derivative!`](@ref):
 
+| Mode               | Formula                          |
+|:------------------:|:--------------------------------:|
+| `:abs_norm`        | $$\Delta f(x)$$                  |
+
+The theory behind this method can be found in [Gr13](@cite).
 
 
 ## Higher-Order 
@@ -55,7 +59,7 @@ and we want to compute the mixed-partials
 ```math
 \left[\frac{\partial^3\partial^2 f(x)}{\partial^3 x_2 \partial^2 x_1}, \frac{\partial^4 f(x)}{\partial^4 x_3}, \frac{\partial^2 f(x)}{\partial^2 x_1}\right]
 ``` 
-leveraging the [`derivative`](@ref) driver. After defining the function `f` and the point for the derivative evaluation `x`, we have to select the format of the `partials`. There exist two options explained below that use `Vector{Int64}` to define a partial derivative.
+leveraging the [`derivative`](@ref) ([`derivative!`](@ref)) driver. After defining the function `f` and the point for the derivative evaluation `x`, we have to select the format of the `partials`. There exist two options explained below that use `Vector{Int64}` to define a partial derivative.
 
 ## ADOLC-Format
 The ADOLC-Format repeats the index $$i$$ of a derivative direction $$x_i$$ up to the derivative order of this index: $$\frac{\partial^4 f(x)}{\partial^4 x_3} \to [3, 3, 3, 3]$$. Additionally, the resulting vector is sorted descendent; if the vector's length is less than the total derivative degree, it is filled with zeros. The requested mixed-partials results in:

diff --git a/docs/src/lib/guides.md b/docs/src/lib/guides.md
@@ -3,12 +3,12 @@
 ADOLC.jl is a wrapper of the C/C++ library [ADOL-C](https://github.com/coin-or/ADOL-C). Wrapper means
 data from Julia is passed to C++, and calls to functions in Julia trigger C++ function calls to get output data in Julia. The communication between Julia and [ADOL-C](https://github.com/coin-or/ADOL-C) is handled by [CxxWrap.jl](https://github.com/JuliaInterop/CxxWrap.jl), which, for example, converts an `Cint` from Julia into a `int` in C++ automatically. Most functions 
 of [ADOL-C](https://github.com/coin-or/ADOL-C) modify pre-allocated memory in the form of `double*`, `double**`, or `double***` to store the functions' results. Two apparent options exist for providing
-the pre-allocated data to the C++ functions from Julia. The first option would be to write wrappers in C++, which allocates the memory every time before the actual [ADOL-C](https://github.com/coin-or/ADOL-C) function call. This would lose control over the allocation, but it is easier to work with on the Julia side and the C++ side has fully control of the memory. The second option is to enable the allocation of C++-owned data in Julia, pass this data to [ADOL-C](https://github.com/coin-or/ADOL-C)'s function, and access the required values from the C++-owned data in Julia. The second option allows more control over the data, but a user has to be aware of some critical aspects: 
+the pre-allocated data to the C++ functions when calling them in Julia. The first option would be to write wrappers in C++, which allocates the memory every time before the actual [ADOL-C](https://github.com/coin-or/ADOL-C) function call. This would lose control over the allocation, but it is easier to work with on the Julia side, and the C++ side has full control of the memory. The second option is to enable the allocation of C++-owned data in Julia, pass this data to [ADOL-C](https://github.com/coin-or/ADOL-C)'s function, and access the required values from the C++-owned data in Julia. The second option allows more control over the data, but a user has to be aware of some critical aspects: 
 1. C++ owned memory is not automatically garbage collected and can lead to memory leaks quickly. For example, having a Julia function that allocates a `double**` in C++ and binds this pointer to a variable in a Julia would release the pointer to the memory when going out of the functions' scope, but the C++ memory is still there, you just cannot access it anymore.
 2. There are no out-of-boundary errors to prevent access or setting of the data outside of the allocated area, which leads to segmentation faults and program crashes.
-3. If you want to do computations with the C++ data, you either have to copy it to a corresponding Julia type or write your methods to work with the C++ types.  
+3. If you want to do computations with the C++ data, you must copy it to a corresponding Julia type or write your methods to work with the C++ types.  
 
-ADOLC.jl implements the second option. The critical aspects can still be avoided using the driver [`derivative`](@ref), which handles the C++ allocated memory for you. However, the best performance is obtained when using [`derivative!`](@ref). The [`derivative!`](@ref) driver requires for first- and second-order derivative computations a pre-allocated container of C++ type. The allocation is done using [`allocator`](@ref). This function allocates C++ memory for your specific case (i.e., for the problem-specific parameters `m`, `n`, `mode`, `num_dir`, `num_weights`). Thus, the computation of the derivative is no more difficult than utilizing [`derivative`](@ref). For example:
+ADOLC.jl implements the second option. The critical aspects can still be avoided using the driver [`derivative`](@ref), which handles the C++-allocated memory for you. However, the best performance is obtained using [`derivative!`](@ref). The [`derivative!`](@ref) driver requires for first- and second-order derivative computations a pre-allocated container of C++ type. The allocation is done using [`allocator`](@ref). This function allocates C++ memory for your specific case (i.e., for the problem-specific parameters `m`, `n`, `mode`, `num_dir`, `num_weights`). Thus, the computation of the derivative is no more difficult than utilizing [`derivative`](@ref). For example:
 ```@example
 using ADOLC
 f(x) = (x[1] - x[2])^2
@@ -23,7 +23,7 @@ cxx_res = allocator(m, n, mode, num_dir, num_weights)
 derivative!(cxx_res, f, m, n, x, mode, dir=dir)
 deallocator!(cxx_res, m, mode)
 ```
-The first critical point is tackled by using the [`deallocator!`](@ref) function, which handles the release of the C++ memory. Of course, one wants to conduct computations with `cxx_res`. The recommended way to do so is to pre-allocate a corresponding Julia container (`Vector{Float64}`, `Matrix{Float64}` or `Array{Float64, 3}`) obtained from [`jl_allocator`](@ref) and copy the data from `cxx_res` the Julia storage `jl_res` by leveraging [`cxx_res_to_jl_res!`](@ref):
+The first critical point is tackled using the [`deallocator!`](@ref) function, which handles the release of the C++ memory. Of course, one wants to conduct computations with `cxx_res`. The recommended way to do so is to pre-allocate a corresponding Julia container (`Vector{Float64}`, `Matrix{Float64}` or `Array{Float64, 3}`) obtained from [`jl_allocator`](@ref) and copy the data from `cxx_res` the Julia storage `jl_res` by leveraging [`cxx_res_to_jl_res!`](@ref):
 ```@example
 using ADOLC
 f(x) = (x[1] - x[2])^2
@@ -49,10 +49,10 @@ cxx_res_to_jl_res!(jl_res, cxx_res, m, n, mode, num_dir, num_weights)
 
 deallocator!(cxx_res, m, mode)
 ```
-Since you actually work with Julia data, the procedure above avoids the second and third point of the ciritical aspects, but includes an additional allocation.  
+Since you work with Julia data, the procedure above avoids the second and third points of the critical aspects but includes an additional allocation.  
 
 !!! warning 
-    `cxx_res` still stores a pointer. The corresponding memory is destroyed but `cxx_res` itself is managed by Julia's garbage collector. Do not use it.
+    `cxx_res` still stores a pointer. The corresponding memory is destroyed, but `cxx_res` is managed by Julia's garbage collector. Do not use it.
 
 
 In the future, the plan is to implement a struct that combines the Julia and C++ arrays with a finalizer that enables Julia's garbage collector to manage the C++ memory. 
@@ -61,18 +61,18 @@ In the future, the plan is to implement a struct that combines the Julia and C++
 
 ## Seed-Matrix
 This guide is related to the [higher-order](@ref "Higher-Order") derivative computation with 
-[`derivative`](@ref) or [`derivative!`](@ref). Internally, the drivers are based on the propagation of univariate Taylor polynomials. The underlying method leverages a `seed` matrix $$S\in \mathbb{R}^{n \times p}$$ to compute mixed-partials of arbitrary order for a function $$f:\mathbb{R}^n \to \mathbb{R}^m$$ in the form: 
+[`derivative`](@ref) or [`derivative!`](@ref). Internally, the drivers are based on the propagation of univariate Taylor polynomials [griewank_evaluating_1999](@cite). The underlying method leverages a `seed` matrix $$S\in \mathbb{R}^{n \times p}$$ to compute mixed-partials of arbitrary order for a function $$f:\mathbb{R}^n \to \mathbb{R}^m$$ in the form: 
 ```math
     \frac{\partial^k f(x + Sz)}{\partial^k z}\big|_{z=0} 
 ```
-for some $$z \in \mathbb{R}^p$$. Usually $$S$$ is the identity or the partial identity (see [`create_partial_cxx_identity`](@ref)). In case of the identity the formula above boils down to 
+for some $$z \in \mathbb{R}^p$$. Usually, $$S$$ is the *identity* or the *partial identity* (see [`create_partial_cxx_identity`](@ref)), which is also the case, when no `seed` is passed to the driver. To switch between both identity options the flag `id_seed` can be used. In the case of identity, the formula above boils down to 
 ```math
-    \frac{\partial^k f(x + Sz)}{\partial^k z}\big|_{z=0}= \frac{\partial^k f(x)}{\partial^k x}
+    \frac{\partial^k f(x + Sz)}{\partial^k z}\big|_{z=0}= \frac{\partial^k f(x)}{\partial^k x}.
 ```
-and the partial identity results in the same but being more efficient. It ensures to compute only the derivatives of the requested derivative directions and is explained briefly in the following paragraph.   
+Moreover, the partial identity results in the same but is more efficient. Leveraging the partial identity ensures that only the derivatives of the requested derivative directions are computed, and this is explained briefly in the following paragraph.   
 
 Assume we want to compute the derivatives specified in the [Partial-Format](@ref): [[4, 0, 0, 3], [2, 0, 0, 4], [1, 0, 0, 1]].  
-Obviously, none of the derivatives includes $$x_2$$ and $$x_3$$. To avoid the propagation of univariate Polynomials for these directions, the partial identity is created stacking only those canonical basis vector that are related to the required derivative directions. In our case the partial identity looks like this:  
+Obviously, none of the derivatives includes $$x_2$$ and $$x_3$$. To avoid unnecessary computations (i.e., the propagation of unnecessary univariate Polynomials), the partial identity is created, stacking only those canonical basis vectors that are related to the requested derivative directions. In our case, the partial identity looks like this:  
 ```math
 \left[
     \begin{matrix}
@@ -83,4 +83,17 @@ Obviously, none of the derivatives includes $$x_2$$ and $$x_3$$. To avoid the pr
     \end{matrix}
  \right].
 ```
-As you can see, the directions of are reduced from four to two. In general, the number of required univariate Polynomial propagations to compute all mixed-partials up to degree $$d$$ of for $$f$$ is $$\left( \begin{matrix} n - 1 + d \\ d \end{matrix} \right)$$. Leveraging the `seed`$$S$$ reduces this number to $$\left( \begin{matrix} p - 1 + d \\ d \end{matrix} \right)$$, where $$p$$ is often much smaller than $$n$$.
+As you can see, the directions are reduced from four to two. In general, the number of required univariate Polynomial propagations to compute all mixed-partials up to degree $$d$$ of for $$f$$ is $$\left( \begin{matrix} n - 1 + d \\ d \end{matrix} \right)$$. Leveraging the `seed` $$S$$ reduces this number to $$\left( \begin{matrix} p - 1 + d \\ d \end{matrix} \right)$$, where $$p$$ is often much smaller than $$n$$. In addition, $$S$$ can be used as a subspace projection. For example, if $$S=[1, \dots, 1]^T$$, you could compute the sum of the different univariate Taylor coefficients:
+```jldoctest
+using ADOLC
+f(x) = x[1]^3*x[2]^2 - x[2]^3
+x = [1.0, 1.0]
+partials = [[1], [2], [3]]
+seed = [[1.0, 1.0];;]
+res = derivative(f, x, partials, seed)
+
+# output
+
+1×3 Matrix{Float64}:
+ 2.0  14.0  54.0
+```
diff --git a/src/derivative.jl b/src/derivative.jl
@@ -201,9 +201,9 @@ end
         reuse_tape::Bool=false,
         adolc_format::Bool=false,
     )
-    Variant of the [`derivative`](@ref) driver for the computation of [higher-order](@ref "Higher-Order")
-    derivatives, that requires a `seed`. Details on the idea behind `seed` can be found 
-    [here](@ref "Seed-Matrix").
+Variant of the [`derivative`](@ref) driver for the computation of [higher-order](@ref "Higher-Order")
+derivatives, that requires a `seed`. Details on the idea behind `seed` can be found 
+[here](@ref "Seed-Matrix").
 
 
 Example: