Replies: 1 comment 6 replies
-
Using Cora as the dataset and logistic regression with given paper labels: Baseline - varying n_components (8, 16, ..., 2048, graph.number_of_nodes() - 1) n_elbow sweep - [1, 2, ..., 5] In the elbow sweep, the dimensions are massively reduced due to the log2(g.number_of_nodes()) when calculating the initial svd to do elbow cuts over. It should be noted that due to the log2 and the size of this graph, if we use the elbow finder the max dimensions we embed into would be 12. Dotted line is the dimension where the algo determined there was an elbow: If we do elbow cuts on the singular values for the full svd (this is what topologic does) this is what I see: @bryantower mentioned that perhaps we should be doing a full svd and then taking a log2 of the resulting singular values to use for elbow finding: You can get an idea of the accuracy if those elbows were chosen using the very first image. This raises a couple of interesting questions.
Some other things that may or may not be interesting:
|
Beta Was this translation helpful? Give feedback.
-
https://github.com/microsoft/graspologic/blob/dev/graspologic/embed/svd.py#L131
Are we sure we want to take the log2 of the size for our initial "svd lite" before we actually do the ghodsi zhu elbow finding?
Obviously the actual size is crazy pants, but log2 is ... I mean, it's small. On a 50 node graph we only have 6 dimensions to try to find the 2nd elbow in. On 10k node graph, we only have 16 dimensions to find the 2nd elbow.
@Nyecarr ran a bunch of parameter sweeps with ASE, in conjunction with sklearn's logistic regression test to try to assess the importance of dimensionality in assessing the accuracy of the predicted labels.
Using the elbow finder, because the dimensionality is always so low (even if you pick an elbow cut like 10 or something), at most we were getting like 12 dimensions - and the best accuracy was around 40%. If we manually set the dimensionality to something like 100, elbow_cut=None, we were getting accuracy around 80% (and something like ~92% if we used the n_components=matrix.shape[0])
Nick is going to post some graphs showing his results in a response to this discussion, but I'm curious to see what everyone else's thoughts are.
@j1c @asaadeldin11 @bdpedigo @bryantower
Beta Was this translation helpful? Give feedback.
All reactions