-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is noise scaled by Ntrain in RMSProp #2
Comments
The scaling for the noise is for faster convergence in practice; Otherwise, we need to train the model for a long time according to the theory. |
Is the choice of |
Ntrain is the number of data points in the training dataset. |
Yes, but you could use other numbers for scaling like a constant number (100) or batch size etc. So my question is whether you expect that Ntrain is a good choice in practice and it will work well for almost any dataset. Or should we try several values for the scaling and choose the best? |
I expect that Ntrain is a good choice in practice. The "grad" is mean of the gradients computed in the mini-batch. We should use opts.N*grad to approximate the true gradient of the full dataset. Instead, we consider the scaling issue in the stepsize "lr", and come to the update as following:
However, this would take a long time to converge. In practice, I recommend:
|
Thank you for the explanation. I saw that also |
Yes, the convergence also holds for SGLD. |
In
SGLD_RMSprop.m
the noise is scaled byopts.N
which is set to Ntrain in DNN experiments:https://github.com/ChunyuanLI/pSGLD/blob/master/pSGLD_DNN/algorithms/SGLD_RMSprop.m#L51
Why is this the case? In the paper (https://arxiv.org/pdf/1512.07666v1.pdf) there is no such scaling.
I also checked
SGLD_Adagrad.m
and there is no scaling by Ntrain for the noise.The text was updated successfully, but these errors were encountered: