Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count Matrix before binarization and artificial doublets simulation #21

Open
Yuntian0716 opened this issue Nov 16, 2022 · 3 comments
Open

Comments

@Yuntian0716
Copy link

Hi AMULET team,

Thank you so much for developing this great method first! So sorry that I am quite new to this area and have some basic questions.

I noticed that in the multiplet detection step, the matrix we generated is already binarized. My first question is can we have the count matrix before binarization by any chance? I tried to repeat your method using the seurat count matrix, and I would like to use the original count matrix generated by AMULET to check the concordance。

The second question I have is that I am still confused about how can I generate artificial doublets to assess the detection accuracy. I have seen your answer in https://github.com/UcarLab/AMULET/issues/16, but still didn't quite get it. It would be great if you could share a sample script to do that or give a specific example of the process.

Many thanks,
Yuntian.

@ajt986
Copy link
Member

ajt986 commented Nov 17, 2022

Hi Yuntian,

Quickly looking at the code, it looks like it's just producing the binarized matrix without generating the counts of the instances. You may be able to modify the generateMatrix function in AMULET.py by changing this line:

matrix[oi, celliddict[cellid]] = 1

to increment the counter by 1 instead of setting it to 1. However this is just counting the occurrences of overlaps > 2 within the merged union locations.

If that is not what you are looking for, the Overlaps.txt file, provides the coordinates that can be traced back to the fragment/bam file where the overlap occurred. With this you will have more control over the count matrix you want to generate.

For artificial multiplets, you essentially, combine the accessible chromatin profiles of two cells. Since each cell corresponds to a barcode, this is a matter of selecting 2 barcodes and combining the reads assigned to that barcode. Essentially the steps are:

  1. Randomly choose pairs of barcodes you wish to combine fragments/reads.
  2. Generate new "barcodes" to mark your artificial doublets.
  3. Modify the input fragments or alignment file. This can be done multiple ways:
  • Change the old barcodes to the new doublet barcode (This will essentially remove the original cells from the analysis)
  • Copy the fragment/read data with the new doublet barcode (This will keep the original cell data in the analysis)

Adjusting the fragment files will be easier as this is a matter of making a new fragment file, adding/editing the tab delimited fields, and then indexing that file. If this is CellRanger ATAC, you can read more about the format here: https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/output/fragments

Hope this helps!

Best,
Asa

@Yuntian0716
Copy link
Author

Hi Asa,

Thank you so much for your fast reply, but I still have some confusion.

For example, if I wanna combine CTTTCTTGTGCAACTA-1 and ATTGCTCGTTTGTGGA-1 together as an artificial doublet, how could I change my singlecell.csv file? Originally, I have two columns in my singlecell.csv file, which is "cellbarcodes(--bcidx)" and "--iscellidx", and it will look like this:

 CTTTCTTGTGCAACTA-1,1
 ATTGCTCGTTTGTGGA-1,1

Then we would like to create a new "barcodes", e.g. ArificialDbl1, the things I only need to change is the singlecell.csv file if I don't wanna keep the original cell data? I was just wondering how should I change the singlecell.csv file? Could you please explain a little bit more about this?

In addition, just wanna confirm, when we run the multiplet detection with artificial doublets, the artificial doublets should be included in the calculation of the average of rowsums right(to get the lamba of poisson distribution)? In this case, then the number of artificial doublets should be very small (in your paper 2.5%). But if we would like to generate a higher proportion of artificial doublets, then they are not supposed to be included in the lambda calculation, am I right?

Thank you so much for your help!

Many thanks,
Yuntian.

@ajt986
Copy link
Member

ajt986 commented Nov 17, 2022

You'll need to update both the singlecell.csv file and the fragment file. You will need to add a new rows in the singlecell.csv file with your doublet barcodes. If you are excluding the original cells/barcodes, remove them from this file as well.

For the lambda estimation, yes, the doublets were included in that calculation since there is no way to know what the singlets are for the correct estimate when applying the algorithm in real case scenarios. So with larger number of doublets, this background will start losing sensitivity with the increase in the background average from them.

Best,
Asa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants