Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option in library definition to filter on MAPQ #69

Open
bbimber opened this issue Feb 28, 2022 · 0 comments
Open

Add option in library definition to filter on MAPQ #69

bbimber opened this issue Feb 28, 2022 · 0 comments

Comments

@bbimber
Copy link

bbimber commented Feb 28, 2022

The background is that 10x data are generally first aligned to a reference genome (like the human genome or macaque genome). This makes a BAM file that has alignments and unaligned reads. Each alignment generally has a mapping quality, indicating the confidence of mapping to the genome. Zero MAPQ generally means unmapped (though we should verify in STAR/10x BAMs). It might be useful in some cases to only have nimble consider reads that did not otherwise have confident matches to the organism's genome. In theory this might reduce noise, and it would avoid a potential concern about double-counting reads.

Here are relevant docs on the 10x-specific alignment-level flags. In addition, MAPQ should be in the BAM:
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/bam

and here is STAR's doc (STAR is the aligner 10x cellranger uses internally). see 4.2.1 for how they use MAPQ, including multimapping reads:
https://physiology.med.cornell.edu/faculty/skrabanek/lab/angsd/lecture_notes/STARmanual.pdf

My interpretation of this is that reads with a confident single match are encoded by STAR as 255. anything less than 255 this is multi-mapped, defined by the formula in STAR's docs in 4.2.1. Practically speaking, I think we should:

  • implement a library level setting for "omitAlignmentsWithMapQAbove=XX". This filter would only apply for nimble if the input is a BAM file, not FASTQs. In practice, we could set 255 as this value, which would mean any single-aligned alignment would be discarded, and nimble would only inspect multi-mapped for or unmapped reads.

See the note in issue #68 about debugging output and including MAPQ.

As always, it would be very helpful for nimble to maintain some internal information about what it's doing and report that. In this case, simply counting the number of alignments discarded for MAPQ and reporting that figure to STDERR would be valuable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant