Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spatial_lda reproducibility #102

Open
batukav opened this issue Apr 29, 2024 · 0 comments
Open

spatial_lda reproducibility #102

batukav opened this issue Apr 29, 2024 · 0 comments

Comments

@batukav
Copy link

batukav commented Apr 29, 2024

Dear All,

I am working on generating recurrent neighborhoods using spatial_lda for my dataset that contains ~2 million cells and 19 unique cell types.

My strategy was to 1- run the spatial_lda on the anndata object to extract 20 motifs, 2- run K-means clustering using a large number of clusters (k=30) on the latent weights (anndata.uns['spatial_lda']) , and 3- apply an agglomerative clustering on the k-means cluster centers to group cells into recurrent neighborhoods.

My expectation is that I'll get clusters via agglomerated clustering that have similar cell type and number composition across different spatial_lda runs (same parameters, different random seeds) on the same dataset.

My observation is that the above procedure does not give consistent results when spatial_lda is run using a different random seed. That is, the cell type content and number of the final RCN assignments fluctuate wildly between spatial_lda runs. Is this expected or what might I be doing wrong? Can this be a sign of overfitting?

I also add some output/screenshots from my analysis (I applied the spatial_lda on a subset of randomly sampled cells, same anndata object but different random seeds)

AnnData object with n_obs × n_vars = 101363 × 1
    obs: 'X_centroid', 'Y_centroid', 'phenotype', 'imageid', 'cell_id', 'kmeans_labels'
    uns: 'spatial_lda', 'spatial_lda_probability

Agglomerative clustering on the kmeans cluster centers, number of final clusters = 4 (columns are the RCN ids, rows are the cell types, values are the number of cell types for a given RCN):

Run_1
image

Run_2
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant