Add option in library definition to filter on MAPQ #69

bbimber · 2022-02-28T19:02:06Z

The background is that 10x data are generally first aligned to a reference genome (like the human genome or macaque genome). This makes a BAM file that has alignments and unaligned reads. Each alignment generally has a mapping quality, indicating the confidence of mapping to the genome. Zero MAPQ generally means unmapped (though we should verify in STAR/10x BAMs). It might be useful in some cases to only have nimble consider reads that did not otherwise have confident matches to the organism's genome. In theory this might reduce noise, and it would avoid a potential concern about double-counting reads.

Here are relevant docs on the 10x-specific alignment-level flags. In addition, MAPQ should be in the BAM:
https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/output/bam

and here is STAR's doc (STAR is the aligner 10x cellranger uses internally). see 4.2.1 for how they use MAPQ, including multimapping reads:
https://physiology.med.cornell.edu/faculty/skrabanek/lab/angsd/lecture_notes/STARmanual.pdf

My interpretation of this is that reads with a confident single match are encoded by STAR as 255. anything less than 255 this is multi-mapped, defined by the formula in STAR's docs in 4.2.1. Practically speaking, I think we should:

implement a library level setting for "omitAlignmentsWithMapQAbove=XX". This filter would only apply for nimble if the input is a BAM file, not FASTQs. In practice, we could set 255 as this value, which would mean any single-aligned alignment would be discarded, and nimble would only inspect multi-mapped for or unmapped reads.

See the note in issue #68 about debugging output and including MAPQ.

As always, it would be very helpful for nimble to maintain some internal information about what it's doing and report that. In this case, simply counting the number of alignments discarded for MAPQ and reporting that figure to STDERR would be valuable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option in library definition to filter on MAPQ #69

Add option in library definition to filter on MAPQ #69

bbimber commented Feb 28, 2022

Add option in library definition to filter on MAPQ #69

Add option in library definition to filter on MAPQ #69

Comments

bbimber commented Feb 28, 2022