Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with divergence #4

Open
AntixK opened this issue May 19, 2020 · 5 comments
Open

Dealing with divergence #4

AntixK opened this issue May 19, 2020 · 5 comments

Comments

@AntixK
Copy link

AntixK commented May 19, 2020

Hello,

Your work is inspiring!
I have the following problem when I try to run your code.
During training, the loss often blows up and diverges. Could you help me as to how to deal with such divergences? It diverges even after turning off BatchNorm, having warmup-terations... often after 2 epochs.

Any help is appreciated. Thank you.

@m-wiesner
Copy link

If you read the appendix in the paper, they mention that most models had to be restarted from the last checkpoint using a different random seed after crashing. I tried running this code and I experienced the same thing, but as long as I restarted from the last checkpoint, things continued to train. It just took some manual intervention. I also recommend using a low learning rate and bumping up the # inner SLGD iterations per outer minibatch iteration. That should help some with the stability.

@wgrathwohl
Copy link
Owner

Hello, thanks for your kind words. Yes, as m-weisner commented, this is the exact strategy I used and should work. I know it is less than ideal but EBMs are still somewhat brittle these days. Another alternative I've found to work is to place a small l2 penalty on the energy (this. has been done in prior EBM work) with strength around .1. This should keep the energy values near zero and make training more stable.

@USTC-yzy1996
Copy link

Hello, I'm a new guy on Energy Models and I also met the same divergence problem during running the code.
I see two solutions. One is using different random seed, and the other is to place a small l2 penalty. Could you two (m-wiesner & wgrathwohl) please tell me how to implement in the code? For seed, I loaded the checkpoint and the new random seed did not work because the parameters were all from the checkpoint file. For l2 penalty, I can only find [l_p_x], [l_p_y_given_x] and [l_p_x_y] these three loss functions. Where can I find l2 penalty on the energy from the code?
Thanks a lot! :)

@m-wiesner
Copy link

m-wiesner commented Mar 12, 2021 via email

@wgrathwohl
Copy link
Owner

wgrathwohl commented Mar 12, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants