-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model training terminated #8
Comments
As I say in the paper, the best thing to do when the model diverges is to
increase the number of mcmc steps or decrease the learning rate. EBMs are
very finicky creatures! Thankfully, there's been lots of work on improving
and stabilizing the training. One thing I read recently found smooth
nonlinearities to make training considerably more stable. So, you could try
a Swish and see if that helps out.
Cheers
…On Mon, Nov 23, 2020 at 1:42 PM Divyanshu Murli ***@***.***> wrote:
Hi, one other issue I wanted to point out was that the training process
seemed to terminate about 27 epochs in, due to a diverging loss.
Thanks!
[image: Screenshot 2020-11-23 at 08 46 08]
<https://user-images.githubusercontent.com/38363539/100001813-ea9aa800-2d80-11eb-8376-6f7ac75a9970.png>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADYQS4QZO6HJLC4VBPE237TSRKUJXANCNFSM4T73SUOQ>
.
--
Will Grathwohl
Graduate Student Researcher
Machine Learning Group
University of Toronto / Vector Institute
|
Ah, and by MCMC steps do you mean SGLD (sorry not super familiar with MCMC)? |
Related question: so just to be clear, the code in the repo isn't the code used to create the results in the paper? |
It is, but as we write in Appendix H.3:
"We find that when using PCD occasionally throughout training a sample will
be drawn from the replay buffer that has a considerably higher-than average
energy (higher than the energy of a random initialization). This causes the
gradients w.r.t this example to be orders of magnitude larger than
gradients w.r.t the rest of the examples and causes the model to diverge.
We tried a number of heuristic approaches such as gradient clipping, energy
clipping, ignoring examples with atypical energy values, and many others
but could not find an approach that stabilized training and did not hurt
generative and discriminative performance."
I will be the first to admit that EBM training in this way is a nightmare
and requires pretty consistent baby-sitting. At the moment these models are
basically where GANs were in like 2014. Not easy to train. Requires a lot
of hand-tuning. The main point of this paper was to demonstrate the utility
of these models if they can be trained. There have been a number of
improvements which can stabilize EBM training.
You should be able to train these models with some combo of restarts, lr
decrease, and mcmc step increase. I hope that helps.
…On Mon, Nov 23, 2020 at 1:53 PM Milan Cvitkovic ***@***.***> wrote:
Related question: so just to be clear, the code in the repo isn't the code
used to create the results in the paper?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADYQS4X7ZBKVDMDFS2X5S5TSRKVSPANCNFSM4T73SUOQ>
.
--
Will Grathwohl
Graduate Student Researcher
Machine Learning Group
University of Toronto / Vector Institute
|
Definitely helpful, and much appreciated. I'm just curious whether the line of code in the README worked for you, but isn't working for @divymurli. That would be surprising considering that random draws from the buffer should be deterministic under the random seeds you set in the training scripts. I can't see what the source of randomness would be. |
Thanks, increasing MCMC steps helps a lot. |
Hi, one other issue I wanted to point out was that the training process seemed to terminate about 27 epochs in, due to a diverging loss.
Thanks!
The text was updated successfully, but these errors were encountered: