How is the KL-divergence term for the network weights KL(qη(W) || p(W)) calculated practically? #3

pingguokiller · 2023-01-03T03:30:28Z

I've read your paper "Bayesian learning of neural network architectures". Thank you for sharing the code. I want to follow your paper.

I'm confused with the sentence "Also, given that the prior distribution p(W) is a Gaussian, the KL-divergence term for the network weights will be computed analytically and thus will reduce the variance in the gradient estimates." in the last paragraph of Section 2.1.

What is the meaning of "be computed analytically"?
I don't know how the KL-divergence term for the network weights KL(qη(W) || p(W)) is calculated practically.
And I also can not find the corresponding code in GitHub. It seems to be omitted. And why?

Can you help me?

gdikov · 2023-01-04T11:48:45Z

Hi @pingguokiller,

What is the meaning of "be computed analytically"?

The meaning of "to be computed analytically" is that there is a closed form solution for the integral so we don't need to approximate it with MC sampling. Alternatively, one can sample the approximate posterior and compute the average log-ratio between it and the prior under those samples. Quick googling leads to this step-by-step guide on how to do both analytic and sampling-based KL-div estimation for Gaussians.

And I also can not find the corresponding code in GitHub. It seems to be omitted. And why?

Since this work was done during an internship in a company which didn't permit me to open source it I just created those notebooks to show the gist of it. The code does not reproduce all experiments from the paper but rather shows the mechanism of learning layer size and network depth in an MLP. Using this, extending to convolutional layers or Bayesian weights networks should be straight-forward.

Cheers,
Georgi

pingguokiller · 2023-01-04T12:48:59Z

Thanks for your kind interpretation. This helped me a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is the KL-divergence term for the network weights KL(qη(W) || p(W)) calculated practically? #3

How is the KL-divergence term for the network weights KL(qη(W) || p(W)) calculated practically? #3

pingguokiller commented Jan 3, 2023

gdikov commented Jan 4, 2023

pingguokiller commented Jan 4, 2023

How is the KL-divergence term for the network weights KL(qη(W) || p(W)) calculated practically? #3

How is the KL-divergence term for the network weights KL(qη(W) || p(W)) calculated practically? #3

Comments

pingguokiller commented Jan 3, 2023

gdikov commented Jan 4, 2023

pingguokiller commented Jan 4, 2023