Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing callback function to handle different distance formula #12

Closed
DonaldTrump88 opened this issue May 10, 2021 · 6 comments
Closed

Comments

@DonaldTrump88
Copy link

Thanks for providing the library.
The distance formula in

public function getDistanceWith(self $point, bool $precise = true): float
is euclidean formula.
It does not work properly when GPS lattitude and longitude are involved.

Would you provide generic callback function if library user wants to use different distance formula?

Regards.

@bdelespierre
Copy link
Owner

Sure thing @Ninja-007

@bdelespierre
Copy link
Owner

bdelespierre commented May 19, 2021

In the meantime you can provide your own implementation by subtyping the Point class as follow:

<?php

require "vendor/autoload.php";

// prepare 50 points of 2D space to be clustered
$points = [
    [80,55],[86,59],[19,85],[41,47],[57,58],
    [76,22],[94,60],[13,93],[90,48],[52,54],
    [62,46],[88,44],[85,24],[63,14],[51,40],
    [75,31],[86,62],[81,95],[47,22],[43,95],
    [71,19],[17,65],[69,21],[59,60],[59,12],
    [15,22],[49,93],[56,35],[18,20],[39,59],
    [50,15],[81,36],[67,62],[32,15],[75,65],
    [10,47],[75,18],[13,45],[30,62],[95,79],
    [64,11],[92,14],[94,49],[39,13],[60,68],
    [62,10],[74,44],[37,42],[97,60],[47,73],
];

class MyPoint extends KMeans\Point
{
    public function getDistanceWith(KMeans\Point $point, bool $precise = true): float
    {
        // your implementation goes here
        return parent::getDistanceWith($point, $precise);
    }
}

class MySpace extends KMeans\Space
{
    public function newPoint(array $coordinates): KMeans\Point
    {
        if (count($coordinates) != $this->dimention) {
            throw new \LogicException("(" . implode(',', $coordinates) . ") is not a point of this space");
        }

        return new MyPoint($this, $coordinates);
    }
}

// create a 2-dimentions space
$space = new MySpace(2);

// add points to space
foreach ($points as $i => $coordinates) {
    $space->addPoint($coordinates);
}

// cluster these 50 points in 3 clusters
$clusters = $space->solve(3);

// display the cluster centers and attached points
foreach ($clusters as $num => $cluster) {
    $coordinates = $cluster->getCoordinates();
    printf(
        "Cluster %s [%d,%d]: %d points\n",
        $num,
        $coordinates[0],
        $coordinates[1],
        count($cluster)
    );
}

@bdelespierre
Copy link
Owner

@Ninja-007 I made a PR #13 to implement your feature, can you review it?

@DonaldTrump88
Copy link
Author

DonaldTrump88 commented May 19, 2021

@bdelespierre Thanks for your efforts.
Actually I used the clustering with euclidean fomula. Your approach seems good, unfortunately I cannot use it due to performance reasons. But it will be helpful for other people.

I am doing clustering of about 50K locations. Each cluster should have about 20 or less locations. Unfortunately it takes about 1 hour to finish the algorithm. My initial guess says that repeated distance calculation makes it slow, if I add the correct distance formula based on LatLong it will be slower.
If you also think so then adding distance matrix will be help to optimize it. Here is similar example in DBScan.
https://github.com/bhavikm/DBSCAN-clustering/blob/master/index.php
The matrix calculation can be done when user calls solve.
I can create another issue if you want.

@bdelespierre
Copy link
Owner

Thanks for your feedback. I really appreciate it.

Feel free to open another issue for the performance issue 👍 I'll always do my best to improve this library 😃

@DonaldTrump88
Copy link
Author

@bdelespierre
You are welcome.
#14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants