Count Matrix before binarization and artificial doublets simulation #21

Yuntian0716 · 2022-11-16T19:23:08Z

Hi AMULET team,

Thank you so much for developing this great method first! So sorry that I am quite new to this area and have some basic questions.

I noticed that in the multiplet detection step, the matrix we generated is already binarized. My first question is can we have the count matrix before binarization by any chance? I tried to repeat your method using the seurat count matrix, and I would like to use the original count matrix generated by AMULET to check the concordance。

The second question I have is that I am still confused about how can I generate artificial doublets to assess the detection accuracy. I have seen your answer in https://github.com/UcarLab/AMULET/issues/16, but still didn't quite get it. It would be great if you could share a sample script to do that or give a specific example of the process.

Many thanks,
Yuntian.

ajt986 · 2022-11-17T16:04:44Z

Hi Yuntian,

Quickly looking at the code, it looks like it's just producing the binarized matrix without generating the counts of the instances. You may be able to modify the generateMatrix function in AMULET.py by changing this line:

matrix[oi, celliddict[cellid]] = 1

to increment the counter by 1 instead of setting it to 1. However this is just counting the occurrences of overlaps > 2 within the merged union locations.

If that is not what you are looking for, the Overlaps.txt file, provides the coordinates that can be traced back to the fragment/bam file where the overlap occurred. With this you will have more control over the count matrix you want to generate.

For artificial multiplets, you essentially, combine the accessible chromatin profiles of two cells. Since each cell corresponds to a barcode, this is a matter of selecting 2 barcodes and combining the reads assigned to that barcode. Essentially the steps are:

Randomly choose pairs of barcodes you wish to combine fragments/reads.
Generate new "barcodes" to mark your artificial doublets.
Modify the input fragments or alignment file. This can be done multiple ways:

Change the old barcodes to the new doublet barcode (This will essentially remove the original cells from the analysis)
Copy the fragment/read data with the new doublet barcode (This will keep the original cell data in the analysis)

Adjusting the fragment files will be easier as this is a matter of making a new fragment file, adding/editing the tab delimited fields, and then indexing that file. If this is CellRanger ATAC, you can read more about the format here: https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/output/fragments

Hope this helps!

Best,
Asa

Yuntian0716 · 2022-11-17T16:43:18Z

Hi Asa,

Thank you so much for your fast reply, but I still have some confusion.

For example, if I wanna combine CTTTCTTGTGCAACTA-1 and ATTGCTCGTTTGTGGA-1 together as an artificial doublet, how could I change my singlecell.csv file? Originally, I have two columns in my singlecell.csv file, which is "cellbarcodes(--bcidx)" and "--iscellidx", and it will look like this:

 CTTTCTTGTGCAACTA-1,1
 ATTGCTCGTTTGTGGA-1,1

Then we would like to create a new "barcodes", e.g. ArificialDbl1, the things I only need to change is the singlecell.csv file if I don't wanna keep the original cell data? I was just wondering how should I change the singlecell.csv file? Could you please explain a little bit more about this?

In addition, just wanna confirm, when we run the multiplet detection with artificial doublets, the artificial doublets should be included in the calculation of the average of rowsums right(to get the lamba of poisson distribution)? In this case, then the number of artificial doublets should be very small (in your paper 2.5%). But if we would like to generate a higher proportion of artificial doublets, then they are not supposed to be included in the lambda calculation, am I right?

Thank you so much for your help!

Many thanks,
Yuntian.

ajt986 · 2022-11-17T18:42:50Z

You'll need to update both the singlecell.csv file and the fragment file. You will need to add a new rows in the singlecell.csv file with your doublet barcodes. If you are excluding the original cells/barcodes, remove them from this file as well.

For the lambda estimation, yes, the doublets were included in that calculation since there is no way to know what the singlets are for the correct estimate when applying the algorithm in real case scenarios. So with larger number of doublets, this background will start losing sensitivity with the increase in the background average from them.

Best,
Asa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Count Matrix before binarization and artificial doublets simulation #21

Count Matrix before binarization and artificial doublets simulation #21

Yuntian0716 commented Nov 16, 2022

ajt986 commented Nov 17, 2022

Yuntian0716 commented Nov 17, 2022

ajt986 commented Nov 17, 2022

Count Matrix before binarization and artificial doublets simulation #21

Count Matrix before binarization and artificial doublets simulation #21

Comments

Yuntian0716 commented Nov 16, 2022

ajt986 commented Nov 17, 2022

Yuntian0716 commented Nov 17, 2022

ajt986 commented Nov 17, 2022