-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-15958: GLM fix Tweedie ML dispersion estimation #15959
Conversation
…d of as.h2o.data.frame
The Torczon multi-directional search method is to avoid the simplex degenerations with the iterations. If you do not run into the degenerations problem, then you won't need it. |
SInce it is just one dimensional optimization I think it's not a problem. If the 1-simplex (line segment) degenerates it's just a point and that could happen but it should be only due to finite precision and that 0-simplex should be the local optimum. But in practice it shouldn't happen since we would converge soon before having the problem with finite precision unless the user specifies the dispersion epsilon to be the machine epsilon or smaller (zero or negative number). So I think it should be ok unless I'm missing something. Does that sound reasonable @wendycwong ? |
Yes, your reasoning sounds good. I obviously did not read your message carefully and it is the very first one. Really have no excuse here. |
/** | ||
* This method estimates the tweedie dispersion parameter. It will use Newton's update if the new update will | ||
* increase the loglikelihood. Otherwise, the dispersion will be updated as | ||
* dispersionNew = dispersionCurr + learningRate * update. | ||
* In addition, line search is used to increase the magnitude of the update when the update magnitude is too small | ||
* (< 1e-3). | ||
* | ||
* For details, please see seciton IV.I, IV.II, and IV.III in document here: | ||
* For details, please see section IV.I, IV.II, and IV.III in document here: | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace section to sections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do. Looking at the nice documentation you wrote in the comments made me realize that I didn't update it. I'll do it so it's up-to-date with the code. Thank you for directing my attention here!
Don't worry about it @wendycwong. No excuse needed. It happens to me all the time :) |
#15958
I looked at the Torczon’s multi-directional method but I don't think it's suitable for this task as this is just one dimensional optimization problem. Furthermore, the expand (2) and contraction (0.5) constants proposed in the thesis(https://repository.rice.edu/server/api/core/bitstreams/6bfc12f5-a69e-44bf-9e67-1d2e738bec5a/content) would be suboptimal for one dimensional problem (golden section search derivation
explains why https://homepages.math.uic.edu/~jan/mcs471f05/Lec9/gss.pdf). So I used the golden section search.
Description
I found out that the problem is not only with gradient and Hessian but also with Tweedie likelihood calculated using the series method. So the fix is to once in a while (every 10th iteration) calculate the log likelihood using the Tweedie estimator (I implemented for var. power and dispersion estimation as it combines both series and Fourier inversion method to achieve more precise log likelihood estimation) and compare it with the best so far value. I call this a sanity check - basically check if we are improving and if we get worse switch to the golden section search. Last sanity check I do is when the algo things it converged - sometimes the value explodes too fast so it wouldn't get to the sanity check that would find out we got actually worse.
Speed concerns
Basically the same as for the Tweedie variance power and dispersion estimation - for some values (e.g., Tweedie var. power close to 1 (p < 1.2)) it takes longer to estimate the likelihood.
Golden section search has linear convergence so it's asymptotically slower than Newton's method but it appears more robust to noise and since it doesn't require calculation of gradient and Hessian it doesn't seem that much slower for practical purposes.
Results
I managed to get the results close to R's MLE dispersion estimation (often little bit better than R (see the test)). Note that the value we get from
summary(glm)$dispersion
is not MLE but it seems close.Estimation from summary:
MLE dispersion in R:
Problem with R's MLE dispersion estimation is that it sometimes takes very long time (and I don't know if it finishes).
So I used the estimation used from summary for the following plots.
These plots show how it used to behave for different Tweedie variance power [1.2, 1.8]:
and how it behaves after the change:
And these two show the same thing but with log scale:
before
after
The previous plots were generated using the dispersion estimation from the
summary(glm)
so I recalculated the same thing with the true MLE and it matches up until Tweedie var. power = 1.7 where R gets stuck (shown asMLE threshold
in the plot). The rest of the values ([1.7, 1.85]
) I use thesummary
type of calculation.