Releases · ruppinlab/CSI-Microbes-identification

This release refers to the core CSI-Microbes identification code (which is generally contained in the git submodule pathogen-discovery-rules). Each subdirectory contains specific options for the specific dataset, which are not subject to the release. It should be possible to reproduce exactly results by combining the dataset specific options (contained in config/PathSeq-config.yaml) with the correct CSI-Microbes identification code tag.

Common CSI-Microbes Component

This version of CSI-Microbes uses PathSeq (v4.1.8.1) to identify microbial reads. It uses the standard options with the exception of --min-score-identity .7, --skip-quality-filters true and --filter-duplicates false. Unless otherwise specified, it uses the host BWA image file and host k-mer file distributed by PathSeq. Unless otherwise specified, the reads are initially mapped using STAR against the human genome GRCh38.p13 (including scaffolds and alternative loci) with the full annotation from Gencode v34 or mapped using CellRanger (v4.0.0) against the human reference genome distributed with CellRanger.

CSI-Microbes identification on 10x data

First, fastq files are aligned to the human reference genome using CellRanger (v4.0.0). Next, any aligned reads are filtered. Next, using annotations provided by CellRanger, template sequence oligonucleotides and polyA tails are hard-clipped and any reads with length < 15 nucleotides or missing a valid cell barcode (CB) or unique molecular identifier (UMI) tag are removed. Next, the reads are hard-clipped (--cut_tail) and filtered for read length (--length_required 25), low complexity (--low_complexity_filter 30) and low-quality (--unqualified_percent_limit 40) using fastp. The cleaned fastq file is converted to a BAM file and processed through PathSeq. The output BAM of PathSeq is then combined with the filtered CellRanger output BAM to add the necessary CB and UMI tags. This BAM is filtered by valid cell barcode and the best mapping UMI is selected and this cell-specific BAM is re-scored by PathSeq.

CSI-Microbes on full-length scRNA-seq datasets

First, the paired fastq files are hard-clipped (--cut_tail), adapter sequences removed, and filtered for read length (--length_required 25), low complexity (--low_complexity_filter 30) and low-quality (--unqualified_percent_limit 40) using fastp. Then, the paired fastq files are aligned to the human genome using STAR. Next, any aligned reads are removed using STAR's uT tag. Finally, the BAM is run through PathSeq.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common CSI-Microbes Component

CSI-Microbes identification on 10x data

CSI-Microbes on full-length scRNA-seq datasets

Releases: ruppinlab/CSI-Microbes-identification

bioRxiv Release April 2023

bioRxiv May 2021 paper

Release v0.1.0

Common CSI-Microbes Component

CSI-Microbes identification on 10x data

CSI-Microbes on full-length scRNA-seq datasets