You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on generating recurrent neighborhoods using spatial_lda for my dataset that contains ~2 million cells and 19 unique cell types.
My strategy was to 1- run the spatial_lda on the anndata object to extract 20 motifs, 2- run K-means clustering using a large number of clusters (k=30) on the latent weights (anndata.uns['spatial_lda']) , and 3- apply an agglomerative clustering on the k-means cluster centers to group cells into recurrent neighborhoods.
My expectation is that I'll get clusters via agglomerated clustering that have similar cell type and number composition across different spatial_lda runs (same parameters, different random seeds) on the same dataset.
My observation is that the above procedure does not give consistent results when spatial_lda is run using a different random seed. That is, the cell type content and number of the final RCN assignments fluctuate wildly between spatial_lda runs. Is this expected or what might I be doing wrong? Can this be a sign of overfitting?
I also add some output/screenshots from my analysis (I applied the spatial_lda on a subset of randomly sampled cells, same anndata object but different random seeds)
Agglomerative clustering on the kmeans cluster centers, number of final clusters = 4 (columns are the RCN ids, rows are the cell types, values are the number of cell types for a given RCN):
Run_1
Run_2
The text was updated successfully, but these errors were encountered:
Dear All,
I am working on generating recurrent neighborhoods using
spatial_lda
for my dataset that contains ~2 million cells and 19 unique cell types.My strategy was to 1- run the
spatial_lda
on theanndata
object to extract 20 motifs, 2- run K-means clustering using a large number of clusters (k=30
) on the latent weights (anndata.uns['spatial_lda']
) , and 3- apply an agglomerative clustering on the k-means cluster centers to group cells into recurrent neighborhoods.My expectation is that I'll get clusters via agglomerated clustering that have similar cell type and number composition across different
spatial_lda
runs (same parameters, different random seeds) on the same dataset.My observation is that the above procedure does not give consistent results when
spatial_lda
is run using a different random seed. That is, the cell type content and number of the final RCN assignments fluctuate wildly betweenspatial_lda
runs. Is this expected or what might I be doing wrong? Can this be a sign of overfitting?I also add some output/screenshots from my analysis (I applied the
spatial_lda
on a subset of randomly sampled cells, sameanndata
object but different random seeds)Agglomerative clustering on the kmeans cluster centers, number of final clusters = 4 (columns are the RCN ids, rows are the cell types, values are the number of cell types for a given RCN):
Run_1
Run_2
The text was updated successfully, but these errors were encountered: