Skip to content

Commit

Permalink
Added a section on estimating the error on the fitted parameters
Browse files Browse the repository at this point in the history
  • Loading branch information
Fady Bishara committed Aug 1, 2024
1 parent 8ecf8dc commit 7c29411
Show file tree
Hide file tree
Showing 3 changed files with 74 additions and 0 deletions.
Binary file added docs/figures/error_band.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/figures/tangents.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
74 changes: 74 additions & 0 deletions docs/linear_regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -196,3 +196,77 @@ And here are a few things you should _definitely_ do:

The captial letters mean the full vector of features, $x$, and targets, $t$, in the dataset. The operators $\mathrm{cov}$ and $\mathrm{var}$ are the covariance and variance respectively. They can be computed with the `numpy` functions `cov` and `var`. Finally, an overline as in $\overline{X}$ means the average (`numpy.mean`) over the features in the datasets and similarly for the targets.


## Estimating the error on the fitted parameters

!!! danger "This section is techincal"

One can get a lower bound on the variance of the fitted parameters (or, to be more technically correct, the variance of the _estimators_). This is known as the [Cramér-Rao](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound) bound which says this variance is bounded from below by the inverse of the Fischer information, i.e.,

\begin{equation*}
\mathrm{var}(\hat{\boldsymbol\theta})\geq I(\boldsymbol\theta)^{-1}\,,
\end{equation*}

where $I(\boldsymbol\theta)$ is the [Fischer information matrix](https://en.wikipedia.org/wiki/Fisher_information#Matrix_form) which is proportional to the Hessian of the loss function

\begin{equation*}
\left[I(\boldsymbol\theta)\right]_{ij} = -N\,\mathbb{E}\left[
\frac{\partial^2}{\partial\theta_i\partial\theta_j} \log\,f(\boldsymbol{X};\boldsymbol\theta)\Biggr|\boldsymbol\theta
\right]\,,
\end{equation*}

where $f$ is the likelihood function (1) which is related to our loss function via
{ .annotate }

1. I'm slightly confused here :confounded:, this (taken from Wikipedia) is not a likelihood but looks rather like a probability...

\begin{equation*}
-\log f = \frac{\mathcal{L}}{2S^2}\,.
\end{equation*}

Note that the (unbiased) _sample variance_, $S^2$, appears here if we assume a Gaussian likelihood. So, we can write $I_{ij}$ as

\begin{equation*}
\left[I(\boldsymbol\theta)\right]_{ij} = \frac{1}{2S^2}\,\frac{\partial^2\mathcal{L}}{\partial\theta_i\partial\theta_j} = \frac{1}{2S^2}\,H_{ij}\,,
\end{equation*}

where $H_{ij}$ is a matrix element of the Hessian matrix. Thus, the inverse of the Fischer information matrix is (dropping the explicit dependence on the arguments for simplicity)

\begin{equation*}
I^{-1} = 2S^2\,H^{-1} = \frac{2}{N-1}\sum_i^N(y_i-t_i)^2\,H^{-1}
\end{equation*}

Denoting the estimators for $w$ and $b$ by $\hat{w}$ and $\hat{b}$, their (co)variances are thus,

!!! success "(Co)variances of the estimators"

\begin{equation*}
\begin{split}
\mathrm{var}\,\hat{w} &= \frac{1}{N-1}\sum_i^N (y_i-t_i)^2\times
\frac{1}{\mathrm{var}\,X}\left(\mathrm{var}\,X - \overline{X}^2\right)\,,\\
\mathrm{var}\,\hat{b} &= \frac{1}{N-1}\sum_i^N (y_i-t_i)^2\times
\frac{1}{\mathrm{var}\,X}\,,\\
\mathrm{cov}(\hat{w}, \hat{b}) &= \frac{-1}{N-1}\sum_i^N (y_i-t_i)^2\times
\frac{\overline{X}}{\mathrm{var}\,X}\,.
\end{split}
\end{equation*}

### Propagating the errors to the model

Let's keep using the hatted parameters (estimated from the data) to distinguish them from the true parameters. Our model is then
$$
y = \hat{w}x + \hat{b}.
$$

Propagating the uncertainties to $y$, we have
$$
(\delta y)^2 = (\mathrm{var}\,\hat{w})\,x^2 + \mathrm{var}\,\hat{b} + 2x\,\mathrm{cov}(\hat{w}, \hat{b})\,.
$$


<figure markdown="span">
![Error band](./figures/error_band.png){ width="600" }
<figcaption>
The band shows the 95% confidence interval on the fit by propagating the uncertainty on the estimated parameters $\hat{w}$ and $\hat{b}$ as described in the main text.
</figcaption>
</figure>

0 comments on commit 7c29411

Please sign in to comment.