Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multidimensional arrays and diversity clustering #35

Open
LarryBarker opened this issue Oct 7, 2021 · 3 comments
Open

Multidimensional arrays and diversity clustering #35

LarryBarker opened this issue Oct 7, 2021 · 3 comments
Labels

Comments

@LarryBarker
Copy link

Hello, thank you for sharing this package. I'm hoping to use it to help group users into diverse groups based on socioeconomic factors like race, gender, age, etc. Our dataset contains 20 factors that need to be taken into consideration. Have you used this to solve such a problem?

I've started some preliminary testing, and seem to be getting results but I can't tell what is happening behind the scenes. Furthermore, I would like to be able to weight each factor. For example, race may be the most important factor in some cases, while gender may be in others.

Here is what the data looks like:

 user_id => [
    race,
    gender,
    age
 ]

The numerical representation for each possible value is what we store:

array:10 [
  1 => array:3 [
    0 => -10
    1 => 6
    2 => 1
  ]
  2 => array:3 [
    0 => 3
    1 => 2
    2 => 1
  ]
  3 => array:3 [
    0 => 2
    1 => 1
    2 => 5
  ]
  4 => array:3 [
    0 => 9
    1 => 3
    2 => 4
  ]
  5 => array:3 [
    0 => -12
    1 => 6
    2 => 0
  ]
  6 => array:3 [
    0 => -6
    1 => 7
    2 => 3
  ]
  7 => array:3 [
    0 => 7
    1 => 7
    2 => 5
  ]
  8 => array:3 [
    0 => 4
    1 => 4
    2 => 0
  ]
  9 => array:3 [
    0 => 5
    1 => 7
    2 => 1
  ]
  10 => array:3 [
    0 => -11
    1 => 3
    2 => 2
  ]
]

I'm curious as well, after the clustering is performed, is there anyway to retrieve the original key for the data? This is needed because I need to know which users are in each cluster.

If this is not the appropriate channel for this type of question, or beyond the scope of the repo, please let me know. I certainly appreciate any feedback you may have. Thank you :)

@bdelespierre
Copy link
Owner

bdelespierre commented Oct 8, 2021

Hey, @LarryBarker thanks for submitting an issue,

I can't tell what is happening behind the scenes.

You may use a callback function to tap into the algorithm execution 👍

Furthermore, I would like to be able to weight each factor. For example, race may be the most important factor in some cases, while gender may be in others.

I haven't implemented weighted k-means yet. Development effort is now focused on a new v3 version, designed to be easier to extend/override with your own custom algorithms. Have a look here V3

I'm curious as well, after the clustering is performed, is there anyway to retrieve the original key for the data? This is needed because I need to know which users are in each cluster.

You can achieve this by assigning arbitrary data to points 👍

Thank you for using PHP Kmeans and don't hesitate to let us know if you have any feature requests or if you encounter any bugs.

Have a nice day

@bdelespierre bdelespierre changed the title [QUESTION] Multidimensional arrays and diversity clustering Multidimensional arrays and diversity clustering Oct 8, 2021
@LarryBarker
Copy link
Author

@bdelespierre Thanks for the quick reply! I realized after I posted I could attach data to points, so thank you for confirming that.

It's good to know that weighted kmeans is something you have thought about. I assume it is doable? Any resources you might have to help me implement my own?

@bdelespierre
Copy link
Owner

There is unfortunately very little litterature on the topic so I just assumed this is not what the users wanted. That being said, finding the centroid of a group of weighted points is a piece of cake. But I'm not quite sure how to interpret the resuts...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants