Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is the KL-divergence term for the network weights KL(qη(W) || p(W)) calculated practically? #3

Open
pingguokiller opened this issue Jan 3, 2023 · 2 comments

Comments

@pingguokiller
Copy link

I've read your paper "Bayesian learning of neural network architectures". Thank you for sharing the code. I want to follow your paper.

I'm confused with the sentence "Also, given that the prior distribution p(W) is a Gaussian, the KL-divergence term for the network weights will be computed analytically and thus will reduce the variance in the gradient estimates." in the last paragraph of Section 2.1.

What is the meaning of "be computed analytically"?
I don't know how the KL-divergence term for the network weights KL(qη(W) || p(W)) is calculated practically.​
And I also can not find the corresponding code in GitHub. It seems to be omitted. And why?

Can you help me?

@gdikov
Copy link
Owner

gdikov commented Jan 4, 2023

Hi @pingguokiller,

What is the meaning of "be computed analytically"?

The meaning of "to be computed analytically" is that there is a closed form solution for the integral so we don't need to approximate it with MC sampling. Alternatively, one can sample the approximate posterior and compute the average log-ratio between it and the prior under those samples. Quick googling leads to this step-by-step guide on how to do both analytic and sampling-based KL-div estimation for Gaussians.

And I also can not find the corresponding code in GitHub. It seems to be omitted. And why?

Since this work was done during an internship in a company which didn't permit me to open source it I just created those notebooks to show the gist of it. The code does not reproduce all experiments from the paper but rather shows the mechanism of learning layer size and network depth in an MLP. Using this, extending to convolutional layers or Bayesian weights networks should be straight-forward.

Cheers,
Georgi

@pingguokiller
Copy link
Author

Thanks for your kind interpretation. This helped me a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants