Labels cycling cells as doublets #9

mzhibo · 2021-09-22T04:39:13Z

Hi,
Thank you for developing this great tool!
I tried AMULET on two of 10x Multiome datasets and found two issues:

it labeled most of the Mki67+ cycling cells as multiples, which makes sense based on the design of this tool. This might be an issue if the sample contains considerable number of cycling cells. I would suggest doing more performance test using datasets with cells in cycling phase.
the doublets labeling does not seem to be consistent with the doublets identified from the RNA assay. Scrublet prediction is consistent with the shared marker gene expression and higher number of nUMI per cell. Since the 10x multiome kit measures both atac and RNA from the same cell, I don't know how to interpret this difference.
I have attached a plot of the multiples comparison
Do you have any suggestions how to examine the accuracy of the AMULET predictions? Or do you have recommendations on how to fine-tune the prediction?
I am thinking excluding the cluster of cells that is known to be in cycling phase based on the RNA assay.

?

ajt986 · 2021-09-22T15:20:42Z

Thank you for sharing these analyses! It's actually encouraging to see that AMULET does well at detecting the proliferating cells. It is not surprising to see that the majority of these cells are detected as multiplets when using the default parameters since these cells are expected to have >2 reads overlapping.

For calling multiplets with these cells, AMULET should be run using --expectedoverlap 4 (I would grab the new shell script I just updated, or run the steps separately to ensure this parameter is used in both steps of the algorithm). Here, since we expect 2 copies of the maternal and paternal chromosomes each (i.e., 4 chromosomes), multiplets will be the cells/nuclei that systematically have more than 4 reads overlapping. In this case, you can use 2 different csv files to subset the cells into proliferating and non-proliferating cells. For example, the barcodes in the Mki67 cluster would be proliferating and the rest would be non-proliferating. Run the non-proliferating cells with --expectedoverlap 2 (default) and the proliferating with --expectedoverlap 4. There may not be as many multiplets detected for the proliferating cell case due to AMULET requiring sufficient read depth/sequencing saturation.

For comparison with RNA assay multiplets, one of the differences is that AMULET also detects homotypic multiplets (i.e., multiplets from the same cell type) and these types of multiplets will not be captured by methods like scrublet that compare cells with simulated doublets. There should still be some overlap between the two methods though. The UMAPs are hard to compare since some doublet cells are hiding under singlets in the UMAP. How do the UMIs look for AMULET multiplets? I would also inspect clusters that have a majority of multiplets, especially for simulation based methods just to ensure that a cell type with a similar expression profile to other cell types is not being misidentified as multiplet clusters. Similarly, for AMULET, if there are cells that break the assumption that the expected number of chromosomes in the cell is 2, further analyses will need to be done to identify those cells first. If both of these check out, what would the multiplet removal % look like if taking the union of the two methods?

mzhibo · 2021-09-23T03:41:01Z

Hi Asa, Thank you for the detailed information. I will try it in subsets as you suggested and let you know. In the second sample (with only few Mki67+ cells), the nUMI per cell did look higher in AMULET predicted multiplets than the singlets. I was concerned because the heterotypic doublets were not well labeled. I will further look into this as I previously was pretty convinced by simulation-based prediction of heterotypic doublets (they also co-express lineage-specific markers). Thank you, Zhibo From: Asa Thibodeau ***@***.***> Sent: Wednesday, September 22, 2021 8:21 AM To: UcarLab/AMULET ***@***.***> Cc: mzhibo ***@***.***>; Author ***@***.***> Subject: Re: [UcarLab/AMULET] Labels cycling cells as doublets (#9) Thank you for sharing these analyses! It's actually encouraging to see that AMULET does well at detecting the proliferating cells. It is not surprising to see that the majority of these cells are detected as multiplets when using the default parameters since these cells are expected to have >2 reads overlapping. For calling multiplets with these cells, AMULET should be run using --expectedoverlap 4 (I would grab the new shell script I just updated, or run the steps separately to ensure this parameter is used in both steps of the algorithm). Here, since we expect 2 copies of the maternal and paternal chromosomes each (i.e., 4 chromosomes), multiplets will be the cells/nuclei that systematically have more than 4 reads overlapping. In this case, you can use 2 different csv files to subset the cells into proliferating and non-proliferating cells. For example, the barcodes in the Mki67 cluster would be proliferating and the rest would be non-proliferating. Run the non-proliferating cells with --expectedoverlap 2 (default) and the proliferating with --expectedoverlap 4. There may not be as many multiplets detected for the proliferating cell case due to AMULET requiring sufficient read depth/sequencing saturation. For comparison with RNA assay multiplets, one of the differences is that AMULET also detects homotypic multiplets (i.e., multiplets from the same cell type) and these types of multiplets will not be captured by methods like scrublet that compare cells with simulated doublets. There should still be some overlap between the two methods though. The UMAPs are hard to compare since some doublet cells are hiding under singlets in the UMAP. How do the UMIs look for AMULET multiplets? I would also inspect clusters that have a majority of multiplets, especially for simulation based methods just to ensure that a cell type with a similar expression profile to other cell types is not being misidentified as multiplet clusters. Similarly, for AMULET, if there are cells that break the assumption that the expected number of chromosomes in the cell is 2, further analyses will need to be done to identify those cells first. If both of these check out, what would the multiplet removal % look like if taking the union of the two methods? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHVJN3I4SFU75NY5IR2OWILUDHX5LANCNFSM5EQLXTRA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> . <https://github.com/notifications/beacon/AHVJN3JBGF4IFIGHPPSX6SLUDHX5LA5CNFSM5EQLXTRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOG4ROCSY.gif>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Labels cycling cells as doublets #9

Labels cycling cells as doublets #9

mzhibo commented Sep 22, 2021

ajt986 commented Sep 22, 2021

mzhibo commented Sep 23, 2021 via email

Labels cycling cells as doublets #9

Labels cycling cells as doublets #9

Comments

mzhibo commented Sep 22, 2021

ajt986 commented Sep 22, 2021

mzhibo commented Sep 23, 2021 via email