scipy.stats.mode in genes2genes.ClusterUtils.run_agglomerative_clustering #4

leoforster · 2024-09-24T13:33:34Z

Hi, thanks for this interesting new approach for studying single-cell trajectories. I was following the tutorial notebook at https://github.com/Teichlab/Genes2Genes/blob/main/notebooks/Tutorial.ipynb and ran into errors during the Clustering alignments step:

df = ClusterUtils.run_clustering(aligner, metric='levenshtein', experiment_mode=True)

errors with:

IndexError                                Traceback (most recent call last)
<ipython-input-141-2242a2d1f27f> in <module>
----> 1 df = ClusterUtils.run_clustering(aligner, metric='levenshtein', experiment_mode=True)

/mnt/volume/resources/miniconda3/envs/jupyter/lib/python3.9/site-packages/genes2genes/ClusterUtils.py in run_clustering(aligner, metric, DIST_THRESHOLD, experiment_mode)
    115         eval_dists = []
    116         for D_THRESH in tqdm(dist_thresholds):
--> 117             gene_clusters, cluster_ids, silhouette_score, silhouette_score_mode, n_small_cluster = run_agglomerative_clustering(E, aligner.gene_list, D_THRESH)
    118 
    119             if(len(gene_clusters.keys())==1):

/mnt/volume/resources/miniconda3/envs/jupyter/lib/python3.9/site-packages/genes2genes/ClusterUtils.py in run_agglomerative_clustering(E, gene_list, DIST_THRESHOLD, linkage)
     53     silhouette_score = sklearn.metrics.silhouette_score(X=E , labels = model.labels_, metric='precomputed')
     54     silhouette_score_samples = sklearn.metrics.silhouette_samples(X=E , labels = model.labels_, metric='precomputed')
---> 55     silhouette_score_mode = scipy.stats.mode(silhouette_score_samples)[0][0]
     56 
     57     n_clusters_less_members = []

IndexError: invalid index to scalar variable.

This error in scipy.stats.mode might be related to the changes introduced with scipy v1.9 (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mode.html):

Beginning in SciPy 1.9, np.matrix inputs (not recommended for new code) are converted to np.ndarray before the calculation is performed. In this case, the output will be a scalar or np.ndarray of appropriate shape rather than a 2D np.matrix. Similarly, while masked elements of masked arrays are ignored, the output will be a scalar or np.ndarray rather than a masked array with mask=False.

This is fixed by replacing line 55 in ClusterUtils.py:

silhouette_score_mode = scipy.stats.mode(silhouette_score_samples)[0][0]

with

silhouette_score_mode = scipy.stats.mode(silhouette_score_samples)[0]

or checking generally with something like:

mode_result = scipy.stats.mode(silhouette_score_samples)
if mode_result.count.size == 1:
    silhouette_score_mode = mode_result.mode[0]
else:
    silhouette_score_mode = mode_result[0][0]

The text was updated successfully, but these errors were encountered:

dinithins · 2024-09-30T18:59:49Z

Hi @leoforster, Many thanks for your interest in Genes2Genes and for bringing our attention to this version related format change in the scipy.stats.mode() output. Thank you for suggesting a fix as well. Will soon update the source to handle it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scipy.stats.mode in genes2genes.ClusterUtils.run_agglomerative_clustering #4

scipy.stats.mode in genes2genes.ClusterUtils.run_agglomerative_clustering #4

leoforster commented Sep 24, 2024

dinithins commented Sep 30, 2024

scipy.stats.mode in genes2genes.ClusterUtils.run_agglomerative_clustering #4

scipy.stats.mode in genes2genes.ClusterUtils.run_agglomerative_clustering #4

Comments

leoforster commented Sep 24, 2024

dinithins commented Sep 30, 2024