-
-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of the incremental Kolmogorov-Smirnov Statistics #1354
Conversation
@MaxHalford Would you mind having a look through the PR? I think the only remaining problem would be the fact that |
…stribution with the result from the KS test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @hoanganhngo610! It's great that you added a unit test, it makes me trust the code is correct.
Can you add an entry to UNRELEASED.md
?
Co-authored-by: Max Halford <[email protected]>
Co-authored-by: Max Halford <[email protected]>
Co-authored-by: Max Halford <[email protected]>
@MaxHalford I have successfully incorporated all of your proposed changes to the PR, and also provided a description within the UNRELEASED.md file. Would you mind if I merge the PR into the repository after all tests have passed? |
Merged @hoanganhngo610, good job! It's really nice to have this :) |
Thank you so much @MaxHalford! That's really encouraging to hear! |
This implementation features the incremental KS statistics, which is first proposed in the KDD 2016 paper.
This implementation is based on Treap (or Cartesian Tree), with bulk operation and lazy propagation. This allows the implementation to conduct insertion and removal operations in$O(log N)$ with high probability and calculate the KS test in $O(1)$ , a significant improvement compared to the $O(N log N)$ within the non-incremental implementation.
As said, this implementation supports both insertion (via
update
) and deletion (viarevert
). Moreover, a closely related version of the KS Statistics, the Kuiper test, can also be calculated by settingstatistic
tokuiper
.