Added a section on estimating the error on the fitted parameters

European-XFEL · Aug 1, 2024 · 7c29411 · 7c29411
1 parent 8ecf8dc
commit 7c29411
Show file tree

Hide file tree

Showing 3 changed files with 74 additions and 0 deletions.
diff --git a/docs/figures/error_band.png b/docs/figures/error_band.png
diff --git a/docs/figures/tangents.png b/docs/figures/tangents.png
diff --git a/docs/linear_regression.md b/docs/linear_regression.md
@@ -196,3 +196,77 @@ And here are a few things you should _definitely_ do:
 
     The captial letters mean the full vector of features, $x$, and targets, $t$, in the dataset. The operators $\mathrm{cov}$ and $\mathrm{var}$ are the covariance and variance respectively. They can be computed with the `numpy` functions `cov` and `var`. Finally, an overline as in $\overline{X}$ means the average (`numpy.mean`) over the features in the datasets and similarly for the targets.
 
+
+## Estimating the error on the fitted parameters
+
+!!! danger "This section is techincal"
+
+One can get a lower bound on the variance of the fitted parameters (or, to be more technically correct, the variance of the _estimators_). This is known as the [Cramér-Rao](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound) bound which says this variance is bounded from below by the inverse of the Fischer information, i.e.,
+
+\begin{equation*}
+\mathrm{var}(\hat{\boldsymbol\theta})\geq I(\boldsymbol\theta)^{-1}\,,
+\end{equation*}
+
+where $I(\boldsymbol\theta)$ is the [Fischer information matrix](https://en.wikipedia.org/wiki/Fisher_information#Matrix_form) which is proportional to the Hessian of the loss function
+
+\begin{equation*}
+\left[I(\boldsymbol\theta)\right]_{ij} = -N\,\mathbb{E}\left[
+    \frac{\partial^2}{\partial\theta_i\partial\theta_j} \log\,f(\boldsymbol{X};\boldsymbol\theta)\Biggr|\boldsymbol\theta
+\right]\,,
+\end{equation*}
+
+where $f$ is the likelihood function (1) which is related to our loss function via
+{ .annotate }
+
+1. I'm slightly confused here :confounded:, this (taken from Wikipedia) is not a likelihood but looks rather like a probability...
+
+\begin{equation*}
+-\log f = \frac{\mathcal{L}}{2S^2}\,.
+\end{equation*}
+
+Note that the (unbiased) _sample variance_, $S^2$, appears here if we assume a Gaussian likelihood. So, we can write $I_{ij}$ as
+
+\begin{equation*}
+\left[I(\boldsymbol\theta)\right]_{ij} = \frac{1}{2S^2}\,\frac{\partial^2\mathcal{L}}{\partial\theta_i\partial\theta_j} = \frac{1}{2S^2}\,H_{ij}\,,
+\end{equation*}
+
+where $H_{ij}$ is a matrix element of the Hessian matrix. Thus, the inverse of the Fischer information matrix is (dropping the explicit dependence on the arguments for simplicity)
+
+\begin{equation*}
+I^{-1} = 2S^2\,H^{-1} = \frac{2}{N-1}\sum_i^N(y_i-t_i)^2\,H^{-1}
+\end{equation*}
+
+Denoting the estimators for $w$ and $b$ by $\hat{w}$ and $\hat{b}$, their (co)variances are thus,
+
+!!! success "(Co)variances of the estimators"
+
+    \begin{equation*}
+    \begin{split}
+    \mathrm{var}\,\hat{w} &= \frac{1}{N-1}\sum_i^N (y_i-t_i)^2\times
+      \frac{1}{\mathrm{var}\,X}\left(\mathrm{var}\,X - \overline{X}^2\right)\,,\\
+    \mathrm{var}\,\hat{b} &= \frac{1}{N-1}\sum_i^N (y_i-t_i)^2\times
+       \frac{1}{\mathrm{var}\,X}\,,\\
+    \mathrm{cov}(\hat{w}, \hat{b}) &= \frac{-1}{N-1}\sum_i^N (y_i-t_i)^2\times
+       \frac{\overline{X}}{\mathrm{var}\,X}\,.
+    \end{split}
+    \end{equation*}
+
+### Propagating the errors to the model
+
+Let's keep using the hatted parameters (estimated from the data) to distinguish them from the true parameters. Our model is then
+$$
+y = \hat{w}x + \hat{b}.
+$$
+
+Propagating the uncertainties to $y$, we have
+$$
+(\delta y)^2 = (\mathrm{var}\,\hat{w})\,x^2 + \mathrm{var}\,\hat{b} + 2x\,\mathrm{cov}(\hat{w}, \hat{b})\,.
+$$
+
+
+<figure markdown="span">
+  ![Error band](./figures/error_band.png){ width="600" }
+  <figcaption>
+  The band shows the 95% confidence interval on the fit by propagating the uncertainty on the estimated parameters $\hat{w}$ and $\hat{b}$ as described in the main text.
+  </figcaption>
+</figure>