-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathGaussian distribution.jl
3918 lines (3037 loc) · 132 KB
/
Gaussian distribution.jl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
### A Pluto.jl notebook ###
# v0.20.4
using Markdown
using InteractiveUtils
# This Pluto notebook uses @bind for interactivity. When running this notebook outside of Pluto, the following 'mock version' of @bind gives bound variables a default value (instead of an error).
macro bind(def, element)
#! format: off
quote
local iv = try Base.loaded_modules[Base.PkgId(Base.UUID("6e696c72-6542-2067-7265-42206c756150"), "AbstractPlutoDingetjes")].Bonds.initial_value catch; b -> missing; end
local el = $(esc(element))
global $(esc(def)) = Core.applicable(Base.get, el) ? Base.get(el) : iv(el)
el
end
#! format: on
end
# ╔═╡ db1a7392-825a-454c-a777-954c05a6a310
using RxInfer
# ╔═╡ f4e55044-ff65-4c21-aeb3-2d8be2b97cf4
using Random
# ╔═╡ 452300fb-2cf2-4fab-8923-b847ed4b0c97
using Distributions, Plots, LaTeXStrings, PlutoUI
# ╔═╡ 636389e3-f169-4255-9b4a-8d3426d042b0
using Integrals
# ╔═╡ 5062b49b-34d0-4364-a9b3-7d29119ed599
using PlutoTeachingTools
# ╔═╡ aa4eb90e-2166-4407-97aa-666f309d5e34
md"""
``a + \mu + μ`` and ``\boldsymbol{a} + \boldsymbol{\mu} + \boldsymbol{μ}``
"""
# ╔═╡ 470a0c46-3eae-490b-b13b-5a74eae16a3d
md"""
``\def\S{\boldsymbol{\Sigma}}\def\m{\boldsymbol{\mu}} \S + \m_a`` and ``\foo + \bar``
"""
# ╔═╡ 6108deb0-2c3f-41c2-819d-106f48d210e7
md"""
# This lecture:
## Gaussian distribution is useful
### Occurance in nature, occurance in math, Entropy
### Preserved in common math (addition, ~mult~, fourier transform)
See the parameters of the new distribution
## Gaussian distribution in code
pdf
cdf
integral
RxInfer
## Gaussian distribution in math
pdf * pdf = pdf
## Bayesian inference
Model:
```math
x \sim \mathcal{N}(\mu,\sigma^2)
```
with ``\sigma^2 \in \mathbb{R}_{\geq 0}`` and ``\mu \in \mathbb{R}``.
Given samples from ``x``, we want to infer the value of ``\mu``, when the value of ``\sigma^2`` is known.
As prior, we set:
```math
\mu \sim \mathcal{N}(0, \sigma_{\mu}^2)
```
with a chosen ``\sigma_{\mu}^2 \in \mathbb{R}_{\geq 0}``.
---
ACTUALLY i want to immediately do this for multiple observations, that makes more sense with data
---
So the model is
```math
x \sim \mathcal{N}(\mu,\sigma^2)
```
with
```math
\mu \sim \mathcal{N}(0, \sigma_{\mu}^2)
```
and given ``\sigma^2, \sigma_{\mu}^2 \in \mathbb{R}_{\geq 0}``
---
The result is:
Derivation:
Easy because you can sum precisions in the posterior.
OKE i will check Bishop for how to properly write this
Would be nice to have a general scheme:
- model
- priors
- data
- knowns
- (bayes)
- posterior
### Inferring sigma (or both)
Optional section
Inferring both might be hard closed form?, also did not manage in RxInfer.
##
"""
# ╔═╡ a6e21f35-5563-43b4-a6cf-8526be3b5f25
# ╔═╡ 1b26aadc-62a1-4ac1-8cf4-ecdbac7ef63f
# ╔═╡ 2033d8df-7e4c-462f-9c82-f2452460e80a
# ╔═╡ 82d49f7f-cd7f-4406-a883-641069025699
# ╔═╡ 806c47e9-6415-424a-9ee3-67699dbca385
# ╔═╡ 0dd544c8-b7c6-11ef-106b-99e6f84894c3
md"""
# Continuous Data and the Gaussian Distribution
"""
# ╔═╡ a9ab8f29-d72b-472a-b038-c5a884808e2f
# ╔═╡ b738c85f-09f1-4c99-801e-daa90f476fce
convert(NormalWeightedMeanPrecision, Normal(3,0.2))
# ╔═╡ a0427b32-7ce2-46a9-b962-7fab500c9b2f
# ╔═╡ a72dadff-e813-4b18-8fa9-86c9b1515fe2
md"""
The **Gaussian Distribution** is very common in science! What makes this distribution so special?
- It occurs naturally in many processes (see the [Central Limit Theorem](https://en.wikipedia.org/wiki/Central_limit_theorem)).
- Gaussian distrbutions are easy to work with mathematically.
"""
# ╔═╡ ce5cf62a-bd84-4abe-8cef-07fde003624d
md"""
## Sampling from the Gaussian distribution
In Julia, we can sample from a Normal distribution using the `rand` function and the Distributions.jl package:
"""
# ╔═╡ 41bc3e62-e454-4f14-9d3c-c725a5a93aa8
sample_gaussian_data = rand(Distributions.Normal(5.0, 0.1), 100)
# ╔═╡ 24125436-1b30-4d05-8502-3c8ede5cc00f
md"""
This gives us a Vector of **`100` numbers**, sampled from the distribution **`Normal(5.0, 0.1)`**.
Let's look at these numbers in a histogram:
"""
# ╔═╡ 29f9c097-8bca-4339-a927-28d101cb3a22
histogram(
sample_gaussian_data;
bins=3.975:.05:6,
size=(650,90), legend=nothing, color="black"
)
# ╔═╡ 7fad63f9-c986-438f-99ec-bc8b8bf70ce5
md"""
## Inference
In the code above, we were **given** the numbers ``(\mu, \sigma)``, and we **generated** 100 numbers.
The **inference** task is the opposite: **given** 100 numbers sampled from a distribution, we try to **infer** the parameters ``(\mu, \sigma)`` were used to generate that data.
Let's look at two methods of inference:
- **ML estimator** gives the most likely **value** of ``(\mu, \sigma)``.
- *(later)* **Bayesian inference** gives **posterior distributions** for ``\mu`` and ``\sigma``.
### ML estimator
We can compute the statistical mean and standard deviation of our data:
"""
# ╔═╡ cc17e180-89f1-48b7-a9dd-fa1e7814ceb7
mean(sample_gaussian_data)
# ╔═╡ b9aa17ac-ff44-4899-b15a-e2136eaadbed
std(sample_gaussian_data)
# ╔═╡ fa5e50f5-56b8-48c0-83a0-031fade560d6
md"""
In this simple Gaussian case, the statistical mean and standard deviation are TODO WHAT IS ML
These values give our **best guess** for the parameters that were used to generate the data. But **how sure are we about these estimates?** Can we give a margin of error?
"""
# ╔═╡ 108ba4f6-9292-45a5-a4f3-bb560b43d31b
# ╔═╡ 52f308bd-3740-4e2c-98d8-794d02697770
md"""
## Sampling a Multivariate Normal distribution
Similar to the 1-dimensional case, we can use the `rand` function to sample from a multivariate normal distribution. Instead of a number and a number, we use a `Vector` for the mean, and a `Matrix` for the covariance:
"""
# ╔═╡ 1225c975-bbd5-4f17-82ff-a072b4e5bad9
sample_mv_gaussian_data = rand(
Distributions.MvNormal([5.0, 7.0], [
2.0 0.8
0.8 0.35
]),
1000
)
# ╔═╡ c07f2474-601d-4bce-98d1-5970b580094b
md"""
The result is a **`2×100` matrix of numbers**, each column corresponding to one sample.
Let's make a scatter plot. Each point corresponds to one sample.
"""
# ╔═╡ daa65444-810d-49af-8e2d-a0af285cd8d4
scatter(
sample_mv_gaussian_data[1,:], sample_mv_gaussian_data[2,:];
color="black", markersize=3, opacity=.5,
legend=nothing,
)
# ╔═╡ 3a99575c-2773-4b86-814c-465a74d297cf
md"""
## Short inference example
In the example below, we take a subset from our dataset. We can then use Bayesian Inference to infer the parameters (mean, variance) from the dataset.
"""
# ╔═╡ 89c630f6-e7ef-4284-86cd-5eaa3383735e
@bindname data_size Slider(
[1,10,20,50,100,200,500,1000,2000,5000,10_000,20_000];
show_value=true,
default=10,
)
# ╔═╡ 3271e5a2-d749-4a7c-801f-f1cb692c74ba
# ╔═╡ 21f842fe-e37a-4154-8561-ef814ccd3e54
md"""
With Bayesian machine learning, we can use this data to **infer** the generating distribution.
As more data is provided, our inference gives a more precise result!
"""
# ╔═╡ 57df5ab6-a9ae-4c8f-8147-78b7e16988d4
# ╔═╡ 9e0d5434-6953-4e7c-884b-9ef607441e1b
# ╔═╡ 7384f4ee-a889-44db-842b-df30ed1e6a49
full_data = rand(
# 🤫 these parameters are secret! we will try to infer them from the data
Normal(10, 7),
20_000
)
# ╔═╡ adac696f-1b5e-4af4-bd45-4a13e8606332
data = full_data[1:data_size]
# ╔═╡ b3f91134-c64c-49cc-b9a6-4da1fe2a37ad
histogram(data; bins=-30.5:30, xlim=(-30,30), size=(650,90), legend=nothing, color="black")
# ╔═╡ bc13ef80-d16f-48c2-b699-3c1043927c8e
md"""
### Three variances
In each of the three charts, take a look at the **variance**. What does the variance mean in this chart?
#### Chart 1 & 2: **data variance**
In the first chart, you can see the variance of our observed variable, which is σ² = 7².
In the second chart, you see our inference results! We tried to **infer** the standard deviation from our data.
#### Chart 3: **posterior variance**
In the third chart, you see another variance, the **variance of the posterior for μ**! A small variance of the posterior means **high precision**. Indeed, as we get more data, this variance gets very small – we are very sure about our inference
"""
# ╔═╡ eed3b2da-8eb6-4462-a852-d39e95fdba48
# ╔═╡ 80612314-bf45-4bb8-9316-a910716ca459
@model function simple_data_model(y, a)
μ ~ Normal(mean=0, var=a)
σ = 7
for i in eachindex(y)
y[i] ~ Normal(mean=μ, var=σ^2)
end
end
# ╔═╡ 25032652-bf9b-4165-9c2f-15a6298145f4
result = infer(
model = simple_data_model(a = 1e9),
data = (y = data,),
)
# ╔═╡ 1c432cb9-bcae-44aa-a32e-1fd142a938b1
result.posteriors
# ╔═╡ 75c3d336-f4dc-43d4-a9fe-d026d37dac66
posterior = convert(Normal, result.posteriors[:μ])
# ╔═╡ 4ffd7bad-3d46-43f5-a0f3-d8a642b98f4c
let
p = plot(; xlim=(-30,30), ylim=(0,0.06), size=(650,90), legend=nothing, title="Some possible distributions")
μ_samples = rand(posterior, 20)
σ_samples = rand(Normal(7.0, 10/data_size), 20)
# @info "hujh" σ_samples
for (μ,σ) in zip(μ_samples, σ_samples)
plot!(p, x -> Distributions.pdf(Normal(μ, max(2.0,σ)), x); color="black", opacity=.3)
vline!(p, [μ]; color="red", opacity=.3)
end
p
end
# ╔═╡ 0501b18c-cdc9-4772-917f-195c75f05118
plot(x -> Distributions.pdf(posterior, x); xlim=(-30,30), ylim=(0,1), size=(650,90), legend=nothing, color="red", normalize=:pdf, title="Posterior of μ")
# ╔═╡ bd9e3aa4-e6a2-4b4e-bd45-ed9a80374cb4
md"""
## Multivariate Gaussian distribution
"""
# ╔═╡ 0dd817c0-b7c6-11ef-1f8b-ff0f59f7a8ce
md"""
### Preliminaries
Goal
* Review of information processing with Gaussian distributions in linear systems
Materials
* Mandatory
* These lecture notes
* Optional
* Bishop pp. 85-93
* [MacKay - 2006 - The Humble Gaussian Distribution](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/files/Mackay-2006-The-humble-Gaussian-distribution.pdf) (highly recommended!)
* [Ariel Caticha - 2012 - Entropic Inference and the Foundations of Physics](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/files/Caticha-2012-Entropic-Inference-and-the-Foundations-of-Physics.pdf), pp.30-34, section 2.8, the Gaussian distribution
* References
* [E.T. Jaynes - 2003 - Probability Theory, The Logic of Science](http://www.med.mcgill.ca/epidemiology/hanley/bios601/GaussianModel/JaynesProbabilityTheory.pdf) (best book available on the Bayesian view on probability theory)
"""
# ╔═╡ 06cad4f9-b40c-428f-b2b9-686274749dc9
rand_deterministic(a...) = rand(MersenneTwister(1), a...)
# ╔═╡ 0dd82814-b7c6-11ef-3927-b3ec0b632c31
md"""
### Example Problem
Consider a set of observations ``D=(x_1,…,x_N)`` in ``\mathbb{R}^2`` (see Figure). All observations were generated by the same process.
We now draw an extra observation ``x_\bullet ∈ \mathbb{R}^2`` from the same data generating process. What is the probability that ``x_\bullet`` lies within the shaded rectangle ``S \subseteq \mathbb{R}^2``?
"""
# ╔═╡ a3638872-847f-47a2-941d-94b9c7d2218b
N = 100
# ╔═╡ cd6cc0a8-eb41-4366-82e4-2b6ee92c7a37
S = ([0,1.],[2,2.])
# ╔═╡ 0aa00f04-a53e-4c2b-9b54-a9f93977c481
in_rectangle(x, rectangle=S) = all(rectangle[1] .<= x .<= rectangle[2])
# ╔═╡ 07b6e6be-eed0-4a89-bcdb-52f3ed79f100
@bind try_again CounterButton()
# ╔═╡ ff222c45-8909-489d-b4a2-ec7d9914ede4
generative_dist = let
# reference
try_again
# the distribution
MvNormal(
[0.0, 1.0],
[
0.8 0.5
0.5 1.0
]
)
end
# ╔═╡ 5c9b660b-d44a-402f-a1f8-738911a183b3
D = rand(generative_dist, N)
# ╔═╡ d34233a3-60fe-4ac6-9822-b6097749206b
x_dot = rand(generative_dist)
# ╔═╡ 76cb64ad-e3e7-4af1-b5b3-4bd73f68883b
let
scatter(D[1,:], D[2,:], marker=:x, markerstrokewidth=3, label=L"D")
scatter!([x_dot[1]], [x_dot[2]], label=L"x_\bullet")
plot!(range(0, 2), [1., 1., 1.], fillrange=2, alpha=0.4, color=:gray,label=L"S")
plot!(xlim=(-3,3), ylim=(-3,3))
end
# ╔═╡ 085e9093-e3d1-41b7-8b3e-ae623809f13d
md"""
### Sampling
"""
# ╔═╡ 0a373d0f-601d-4fce-a1d5-e02732ece9fe
n_samples = 500001
# ╔═╡ b5e23f3d-b35a-40f1-8bd4-58d6128f2e29
count(in_rectangle, eachcol(rand(generative_dist, n_samples))) / n_samples
# ╔═╡ 51fb9204-b480-47f8-bb1e-515b5a0af730
md"""
### Using the `pdf`
"""
# ╔═╡ 042bab9b-e162-436e-894d-70e7ea98fb08
function ∫(f, S)
prob = IntegralProblem(
(x, p) -> f(x),
S
)
solve(prob, HCubatureJL(); abstol=1e-3).u
end
# ╔═╡ 3ca80762-1e7b-46c8-a101-9206de278077
∫(S) do x
Distributions.pdf(generative_dist, x)
end
# ╔═╡ 0dd835ca-b7c6-11ef-0e33-1329e4ba13d8
md"""
### The Gaussian Distribution
Consider a random (vector) variable ``x \in \mathbb{R}^M`` that is "normally" (i.e., Gaussian) distributed. The *moment* parameterization of the Gaussian distribution is completely specified by its *mean* ``\mu`` and *variance* ``\Sigma`` and given by
```math
p(x | \mu, \Sigma) = \mathcal{N}(x|\mu,\Sigma) \triangleq \frac{1}{\sqrt{(2\pi)^M |\Sigma|}} \,\exp\left\{-\frac{1}{2}(x-\mu)^T \Sigma^{-1} (x-\mu) \right\}\,.
```
where ``|\Sigma| \triangleq \mathrm{det}(\Sigma)`` is the determinant of ``\Sigma``.
For the scalar real variable ``x \in \mathbb{R}``, this works out to
```math
p(x | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2 }} \,\exp\left\{-\frac{(x-\mu)^2}{2 \sigma^2} \right\}\,.
```
"""
# ╔═╡ 0dd84542-b7c6-11ef-3115-0f8b26aeaa5d
md"""
Alternatively, the <a id="natural-parameterization">*canonical* (a.k.a. *natural* or *information* ) parameterization</a> of the Gaussian distribution is given by
```math
\begin{equation*}
p(x | \eta, \Lambda) = \mathcal{N}_c(x|\eta,\Lambda) = \exp\left\{ a + \eta^T x - \frac{1}{2}x^T \Lambda x \right\}\,.
\end{equation*}
```
```math
a = -\frac{1}{2} \left( M \log(2 \pi) - \log |\Lambda| + \eta^T \Lambda \eta\right)
```
is the normalizing constant that ensures that ``\int p(x)\mathrm{d}x = 1``.
```math
\Lambda = \Sigma^{-1}
```
is called the *precision matrix*.
```math
\eta = \Sigma^{-1} \mu
```
is the *natural* mean or for clarity often called the *precision-weighted* mean.
"""
# ╔═╡ 0dd8528a-b7c6-11ef-3bc9-eb09c0c530d8
md"""
### Why the Gaussian?
Why is the Gaussian distribution so ubiquitously used in science and engineering? (see also [Jaynes, section 7.14](http://www.med.mcgill.ca/epidemiology/hanley/bios601/GaussianModel/JaynesProbabilityTheory.pdf#page=250), and the whole chapter 7 in his book).
"""
# ╔═╡ 0dd85c94-b7c6-11ef-06dc-7b8797c13fda
md"""
(1) Operations on probability distributions tend to lead to Gaussian distributions:
* Any smooth function with single rounded maximum, if raised to higher and higher powers, goes into a Gaussian function. (useful in sequential Bayesian inference).
* The [Gaussian distribution has higher entropy](https://en.wikipedia.org/wiki/Differential_entropy#Maximization_in_the_normal_distribution) than any other with the same variance.
* Therefore any operation on a probability distribution that discards information but preserves variance gets us closer to a Gaussian.
* As an example, see [Jaynes, section 7.1.4](http://www.med.mcgill.ca/epidemiology/hanley/bios601/GaussianModel/JaynesProbabilityTheory.pdf#page=250) for how this leads to the [Central Limit Theorem](https://en.wikipedia.org/wiki/Central_limit_theorem), which results from performing convolution operations on distributions.
"""
# ╔═╡ 0dd8677a-b7c6-11ef-357f-2328b10f5274
md"""
(2) Once the Gaussian has been attained, this form tends to be preserved. e.g.,
* The convolution of two Gaussian functions is another Gaussian function (useful in sum of 2 variables and linear transformations)
* The product of two Gaussian functions is another Gaussian function (useful in Bayes rule).
* The Fourier transform of a Gaussian function is another Gaussian function.
"""
# ╔═╡ 0dd86f40-b7c6-11ef-2ae8-a3954469bcee
md"""
### Transformations and Sums of Gaussian Variables
A **linear transformation** ``z=Ax+b`` of a Gaussian variable ``x \sim \mathcal{N}(\mu_x,\Sigma_x)`` is Gaussian distributed as
```math
p(z) = \mathcal{N} \left(z \,|\, A\mu_x+b, A\Sigma_x A^T \right) \tag{SRG-4a}
```
In fact, after a linear transformation ``z=Ax+b``, no matter how ``x`` is distributed, the mean and variance of ``z`` are always given by ``\mu_z = A\mu_x + b`` and ``\Sigma_z = A\Sigma_x A^T``, respectively (see [probability theory review lesson](https://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/notebooks/Probability-Theory-Review.ipynb#linear-transformation)). In case ``x`` is not Gaussian, higher order moments may be needed to specify the distribution for ``z``.
"""
# ╔═╡ 0dd87a3a-b7c6-11ef-2bc2-bf2b4969537c
md"""
The **sum of two independent Gaussian variables** is also Gaussian distributed. Specifically, if ``x \sim \mathcal{N} \left(\mu_x, \Sigma_x \right)`` and ``y \sim \mathcal{N} \left(\mu_y, \Sigma_y \right)``, then the PDF for ``z=x+y`` is given by
```math
\begin{align*}
p(z) &= \mathcal{N}(x\,|\,\mu_x,\Sigma_x) \ast \mathcal{N}(y\,|\,\mu_y,\Sigma_y) \\
&= \mathcal{N} \left(z\,|\,\mu_x+\mu_y, \Sigma_x +\Sigma_y \right) \tag{SRG-8}
\end{align*}
```
The sum of two Gaussian *distributions* is NOT a Gaussian distribution. Why not?
"""
# ╔═╡ 5978ce29-3fd1-44c1-a3cf-37fd336e2a35
a = rand(Normal(4.3, 2), 51000)
# ╔═╡ 2c31df74-726b-48ff-a402-5a9f67392060
b = rand(Normal(3.0, 2.0), 51000)
# ╔═╡ e90979ad-cfd8-4fe8-9bd3-a434e394a8f2
md"""
The product of two Guassian distributed variables is **not** Gaussian distributed!
"""
# ╔═╡ 272ae2be-7784-4100-a03f-4b3aa400dd18
hist(data; kwargs...) = histogram(data; bins=-30.5:30, xlim=(-30,30), size=(650,90), legend=nothing, kwargs...)
# ╔═╡ 0837d757-9beb-4956-836c-a2c487bea9e3
hist(a; color="blue")
# ╔═╡ 09eb45b0-4117-46ae-adaf-fe1e82578478
hist(b; color="red")
# ╔═╡ 8554d2d0-0acf-47a1-92e1-9b025031a5d6
let
sum = a .+ b
hist(sum; color="purple", normalize=:pdf)
plot!(x -> Distributions.pdf(Normal(mean(sum),std(sum)),x))
end
# ╔═╡ 3c454c77-2320-49b8-b95c-e869f9e9dcd9
let
prod = a .* b
hist(prod; color="purple", normalize=:pdf)
plot!(x -> Distributions.pdf(Normal(mean(prod),std(prod)),x))
end
# ╔═╡ 0dd88110-b7c6-11ef-0b82-2ffe13a68cad
md"""
### Example: Gaussian Signals in a Linear System
<p style="text-align:center;"><img src="./figures/fig-linear-system.png" width="400px"></p>
Given independent variables
```math
x \sim \mathcal{N}(\mu_x,\sigma_x^2)
```
and ``y \sim \mathcal{N}(\mu_y,\sigma_y^2)``, what is the PDF for ``z = A\cdot(x -y) + b`` ? (for answer, see [Exercises](http://nbviewer.jupyter.org/github/bertdv/BMLIP/blob/master/lessons/exercises/Exercises-The-Gaussian-Distribution.ipynb))
"""
# ╔═╡ 0dd88a84-b7c6-11ef-133c-3d85f0703c19
md"""
Think about the role of the Gaussian distribution for stochastic linear systems in relation to what sinusoidals mean for deterministic linear system analysis.
"""
# ╔═╡ c50f6bf0-a1e5-4bd6-a6a2-7201a4302c2f
TODO(md"Think about? ")
# ╔═╡ d93fd4bb-7bf2-453b-97be-602ee494a786
# ╔═╡ 9612754a-7ed3-4d4d-b4a9-9db3c2059271
# ╔═╡ 5dd56dd1-3db0-4b5e-bf74-1ba47f04b58a
TODO(md"""
What is defined here? Is ``\theta`` a stochastic var?
""")
# ╔═╡ 0dd890ee-b7c6-11ef-04b7-e7671227d8cb
md"""
### Bayesian Inference for the Gaussian
Let's estimate a constant ``\theta`` from one 'noisy' measurement ``x`` about that constant.
We assume the following measurement equations (the tilde ``\sim`` means: 'is distributed as'):
```math
\begin{align*}
x &= \theta + \epsilon \\
\epsilon &\sim \mathcal{N}(0,\sigma^2)
\end{align*}
```
Also, let's assume a Gaussian prior for ``\theta``
```math
\begin{align*}
\theta &\sim \mathcal{N}(\mu_0,\sigma_0^2) \\
\end{align*}
```
"""
# ╔═╡ 0dd89b6e-b7c6-11ef-2525-73ee0242eb91
md"""
##### Model specification
Note that you can rewrite these specifications in probabilistic notation as follows:
```math
\begin{align*}
p(x|\theta) &= \mathcal{N}(x|\theta,\sigma^2) \\
p(\theta) &=\mathcal{N}(\theta|\mu_0,\sigma_0^2)
\end{align*}
```
"""
# ╔═╡ 0dd8b5d6-b7c6-11ef-1eb9-4f4289261e79
md"""
(**Notational convention**). Note that we write ``\epsilon \sim \mathcal{N}(0,\sigma^2)`` but not ``\epsilon \sim \mathcal{N}(\epsilon | 0,\sigma^2)``, and we write ``p(\theta) =\mathcal{N}(\theta|\mu_0,\sigma_0^2)`` but not ``p(\theta) =\mathcal{N}(\mu_0,\sigma_0^2)``.
"""
# ╔═╡ 0dd8c024-b7c6-11ef-3ca4-f9e8286cbb64
md"""
##### Inference
For simplicity, we assume that the variance ``\sigma^2`` is given and will proceed to derive a Bayesian posterior for the mean ``\theta``. The case for Bayesian inference of ``\sigma^2`` with a given mean is [discussed in the optional slides](#inference-for-precision).
"""
# ╔═╡ 0dd8d976-b7c6-11ef-051f-4f6cb3db3d1b
md"""
Let's do Bayes rule for the posterior PDF ``p(\theta|x)``.
```math
\begin{align*}
p(\theta|x) &= \frac{p(x|\theta) p(\theta)}{p(x)} \propto p(x|\theta) p(\theta) \\
&= \mathcal{N}(x|\theta,\sigma^2) \mathcal{N}(\theta|\mu_0,\sigma_0^2) \\
&\propto \exp \left\{ -\frac{(x-\theta)^2}{2\sigma^2} - \frac{(\theta-\mu_0)^2}{2\sigma_0^2} \right\} \\
&\propto \exp \left\{ \theta^2 \cdot \left( -\frac{1}{2 \sigma_0^2} - \frac{1}{2\sigma^2} \right) + \theta \cdot \left( \frac{\mu_0}{\sigma_0^2} + \frac{x}{\sigma^2}\right) \right\} \\
&= \exp\left\{ -\frac{\sigma_0^2 + \sigma^2}{2 \sigma_0^2 \sigma^2} \left( \theta - \frac{\sigma_0^2 x + \sigma^2 \mu_0}{\sigma^2 + \sigma_0^2}\right)^2 \right\}
\end{align*}
```
which we recognize as a Gaussian distribution w.r.t. ``\theta``.
"""
# ╔═╡ 0dd8df66-b7c6-11ef-011a-8d90bba8e2cd
md"""
(Just as an aside,) this computational 'trick' for multiplying two Gaussians is called **completing the square**. The procedure makes use of the equality
```math
ax^2+bx+c_1 = a\left(x+\frac{b}{2a}\right)^2+c_2
```
"""
# ╔═╡ 0dd8ea56-b7c6-11ef-0116-691b99023eb5
md"""
In particular, it follows that the posterior for ``\theta`` is
```math
\begin{equation*}
p(\theta|x) = \mathcal{N} (\theta |\, \mu_1, \sigma_1^2)
\end{equation*}
```
where
```math
\begin{align*}
\frac{1}{\sigma_1^2} &= \frac{\sigma_0^2 + \sigma^2}{\sigma^2 \sigma_0^2} = \frac{1}{\sigma_0^2} + \frac{1}{\sigma^2} \\
\mu_1 &= \frac{\sigma_0^2 x + \sigma^2 \mu_0}{\sigma^2 + \sigma_0^2} = \sigma_1^2 \, \left( \frac{1}{\sigma_0^2} \mu_0 + \frac{1}{\sigma^2} x \right)
\end{align*}
```
"""
# ╔═╡ 36a731be-8795-4bb3-9d0c-6b9f5abe2a53
TODO(
md"""
Ik vind dit voorbeeld eigenlijk niet zo nuttig, misschien gelijk naar het voorbeeld met meerdere waarnemingen? Dat is nuttig voor data, deze niet
"""
)
# ╔═╡ 0dd8f1fe-b7c6-11ef-3386-e37f33577577
md"""
### (Multivariate) Gaussian Multiplication
So, multiplication of two Gaussian distributions yields another (unnormalized) Gaussian with
* posterior precision equals **sum of prior precisions**
* posterior precision-weighted mean equals **sum of prior precision-weighted means**
"""
# ╔═╡ 0dd8fbe2-b7c6-11ef-1f78-63dfd48146fd
md"""
As we just saw, a Gaussian prior, combined with a Gaussian likelihood, make Bayesian inference analytically solvable (!):
```math
\begin{equation*}
\underbrace{\text{Gaussian}}_{\text{posterior}}
\propto \underbrace{\text{Gaussian}}_{\text{likelihood}} \times \underbrace{\text{Gaussian}}_{\text{prior}}
\end{equation*}
```
"""
# ╔═╡ 0dd90644-b7c6-11ef-2fcf-2948d45f43bb
md"""
<a id="Gaussian-multiplication"></a>In general, the multiplication of two multi-variate Gaussians over ``x`` yields an (unnormalized) Gaussian over ``x``:
```math
\begin{equation*}
\boxed{\mathcal{N}(x|\mu_a,\Sigma_a) \cdot \mathcal{N}(x|\mu_b,\Sigma_b) = \underbrace{\mathcal{N}(\mu_a|\, \mu_b, \Sigma_a + \Sigma_b)}_{\text{normalization constant}} \cdot \mathcal{N}(x|\mu_c,\Sigma_c)} \tag{SRG-6}
\end{equation*}
```
where
```math
\begin{align*}
\Sigma_c^{-1} &= \Sigma_a^{-1} + \Sigma_b^{-1} \\
\Sigma_c^{-1} \mu_c &= \Sigma_a^{-1}\mu_a + \Sigma_b^{-1}\mu_b
\end{align*}
```
"""
# ╔═╡ 0dd91b7a-b7c6-11ef-1326-7bbfe5ac16bf
md"""
Check out that normalization constant ``\mathcal{N}(\mu_a|\, \mu_b, \Sigma_a + \Sigma_b)``. Amazingly, this constant can also be expressed by a Gaussian!
"""
# ╔═╡ 0dd9264e-b7c6-11ef-0fa9-d3e4e5053654
md"""
```math
\Rightarrow
```
Note that Bayesian inference is trivial in the [*canonical* parameterization of the Gaussian](#natural-parameterization), where we would get
```math
\begin{align*}
\Lambda_c &= \Lambda_a + \Lambda_b \quad &&\text{(precisions add)}\\
\eta_c &= \eta_a + \eta_b \quad &&\text{(precision-weighted means add)}
\end{align*}
```
This property is an important reason why the canonical parameterization of the Gaussian distribution is useful in Bayesian data processing.
"""
# ╔═╡ 0dd93204-b7c6-11ef-143e-2b7b182f8be1
md"""
### Code Example: Product of Two Gaussian PDFs
Let's plot the exact product of two Gaussian PDFs as well as the normalized product according to the above derivation.
"""
# ╔═╡ 0dd93236-b7c6-11ef-2656-b914f13c4ecd
let
d1 = Normal(0, 1) # μ=0, σ^2=1
d2 = Normal(3, 2) # μ=3, σ^2=4
# Calculate the parameters of the product d1*d2
s2_prod = (d1.σ^-2 + d2.σ^-2)^-1
m_prod = s2_prod * ((d1.σ^-2)*d1.μ + (d2.σ^-2)*d2.μ)
d_prod = Normal(m_prod, sqrt(s2_prod)) # Note that we neglect the normalization constant.
# Plot stuff
x = range(-4, stop=8, length=100)
plot(x, pdf.(d1,x), label=L"\mathcal{N}(0,1)", fill=(0, 0.1)) # Plot the first Gaussian
plot!(x, pdf.(d2,x), label=L"\mathcal{N}(3,4)", fill=(0, 0.1)) # Plot the second Gaussian
plot!(x, pdf.(d1,x) .* pdf.(d2,x), label=L"\mathcal{N}(0,1) \mathcal{N}(3,4)", fill=(0, 0.1)) # Plot the exact product
plot!(x, pdf.(d_prod,x), label=L"Z^{-1} \mathcal{N}(0,1) \mathcal{N}(3,4)", fill=(0, 0.1)) # Plot the normalized Gaussian product
end
# ╔═╡ 0dd93f08-b7c6-11ef-3ad5-97d01baafa7c
md"""
### Bayesian Inference with multiple Observations
Now consider that we measure a data set ``D = \{x_1, x_2, \ldots, x_N\}``, with measurements
```math
\begin{aligned}
x_n &= \theta + \epsilon_n \\
\epsilon_n &\sim \mathcal{N}(0,\sigma^2)
\end{aligned}
```
and the same prior for ``\theta``:
```math
\theta \sim \mathcal{N}(\mu_0,\sigma_0^2) \\
```
Let's derive the distribution ``p(x_{N+1}|D)`` for the next sample .
"""
# ╔═╡ 0dd94cb4-b7c6-11ef-0d42-5f5f3b071afa
md"""
##### inference
First, we derive the posterior for ``\theta``:
```math
\begin{align*}
p(\theta|D) \propto \underbrace{\mathcal{N}(\theta|\mu_0,\sigma_0^2)}_{\text{prior}} \cdot \underbrace{\prod_{n=1}^N \mathcal{N}(x_n|\theta,\sigma^2)}_{\text{likelihood}}
\end{align*}
```
which is a multiplication of ``N+1`` Gaussians and is therefore also Gaussian-distributed.
"""
# ╔═╡ 0dd96092-b7c6-11ef-08b6-99348eca8529
md"""
Using the property that precisions and precision-weighted means add when Gaussians are multiplied, we can immediately write the posterior
```math
p(\theta|D) = \mathcal{N} (\theta |\, \mu_N, \sigma_N^2)
```
as
```math
\begin{align*}
\frac{1}{\sigma_N^2} &= \frac{1}{\sigma_0^2} + \sum_n \frac{1}{\sigma^2} \qquad &\text{(B-2.142)} \\
\mu_N &= \sigma_N^2 \, \left( \frac{1}{\sigma_0^2} \mu_0 + \sum_n \frac{1}{\sigma^2} x_n \right) \qquad &\text{(B-2.141)}
\end{align*}
```
"""
# ╔═╡ 0dd992ee-b7c6-11ef-3add-cdf7452bc514
md"""
##### application: prediction of future sample
We now have a posterior for the model parameters. Let's write down what we know about the next sample ``x_{N+1}``.
```math
\begin{align*}
p(x_{N+1}|D) &= \int p(x_{N+1}|\theta) p(\theta|D)\mathrm{d}\theta \\
&= \int \mathcal{N}(x_{N+1}|\theta,\sigma^2) \mathcal{N}(\theta|\mu_N,\sigma^2_N) \mathrm{d}\theta \\
&= \int \mathcal{N}(\theta|x_{N+1},\sigma^2) \mathcal{N}(\theta|\mu_N,\sigma^2_N) \mathrm{d}\theta \\
&= \int \mathcal{N}(x_{N+1}|\mu_N, \sigma^2_N +\sigma^2 ) \mathcal{N}(\theta|\cdot,\cdot)\mathrm{d}\theta \tag{use SRG-6} \\
&= \mathcal{N}(x_{N+1}|\mu_N, \sigma^2_N +\sigma^2 ) \underbrace{\int \mathcal{N}(\theta|\cdot,\cdot)\mathrm{d}\theta}_{=1} \\
&=\mathcal{N}(x_{N+1}|\mu_N, \sigma^2_N +\sigma^2 )
\end{align*}
```
"""
# ╔═╡ 0dd9a40a-b7c6-11ef-2864-8318d8f3d827
md"""
Uncertainty about ``x_{N+1}`` involved both uncertainty about the parameter (``\sigma_N^2``) and observation noise ``\sigma^2``.
"""
# ╔═╡ 14169498-8a04-446a-bf4d-41f5026a2d4c
TODO(md"``\mu_N`` is not the same as ``\mu_N``")
# ╔═╡ 0dd9b71a-b7c6-11ef-2c4a-a3f9e7f2bc87
md"""
### Maximum Likelihood Estimation for the Gaussian
In order to determine the *maximum likelihood* estimate of ``\theta``, we let ``\sigma_0^2 \rightarrow \infty`` (leads to uniform prior for ``\theta``), yielding $ \frac{1}{\sigma_N^2} = \frac{N}{\sigma^2}$ and consequently
```math
\begin{align*}
\mu_{\text{ML}} = \left.\mu_N\right\vert_{\sigma_0^2 \rightarrow \infty} = \sigma_N^2 \, \left( \frac{1}{\sigma^2}\sum_n x_n \right) = \frac{1}{N} \sum_{n=1}^N x_n
\end{align*}