-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement #14
Comments
I have read few positive reviews of mini batch clustering. |
Thanks @Ninja-007, I'll give it a look. If you have suggestions for implementation feel free to start a PR |
Hi! You can store a diagonal matrix in a one-dimensional array. It could be two times faster. https://gist.github.com/halfhope/8589f5f97f76e066480dcfc7c0ac88da |
Hi @halfhope, thanks for your comment. Can you propose an implementation and make a pull-request? |
I am doing clustering of about 50K locations. Each cluster should have about 20 or less locations. Unfortunately it takes about 1 hour to finish the algorithm. My initial guess says that repeated distance calculation makes it slow, if I add the correct distance formula based on LatLong it will be slower.
If you also think so then adding distance matrix will be help to optimize it. Here is similar example in DBScan.
https://github.com/bhavikm/DBSCAN-clustering/blob/master/index.php
The matrix calculation can be done when user calls solve.
The text was updated successfully, but these errors were encountered: