Skip to content
cziegenhain edited this page Jul 30, 2019 · 12 revisions

zUMIs' parent output directory contains following files:

This is an intermediate file where all the cell barcodes with total number of reads are listed

<Project_name>.BCstats.txt
  • Log files generated by STAR
<Project_name>.*.out
  • BAM files generated by every module (Filtering, Mapping and Counting)
# filtered, unmapped BAM file tagged by cell barcode (BC) and UMI (UB)
will be found temporarily in zUMIs_output/.tmpMerge/
and after merging in <Project_name>.filtered.tagged.unmapped.bam

# filtered, STAR mapped BAM file with retained BC and UB tags
<Project_name>.filtered.tagged.Aligned.out.bam

# filtered, STAR mapped BAM file tagged by featureCounts with assignment to exonic regions
<Project_name>.filtered.tagged.Aligned.out.bam.ex.featureCounts.bam

# filtered, STAR mapped BAM file tagged by featureCounts with assignment to intronic regions
<Project_name>.filtered.tagged.Aligned.out.bam.in.featureCounts.bam

# Note: When cell barcode oder UMI hamming distance binning is activated, the raw sequence is stored in BX (cell barcode) and UX (UMI) tags with the error corrected tags being BC and UB. 

The final output is structured in two subdirectories:

zUMIs_output/expression
zUMIs_output/stats

The retained cell barcodes with total number of reads are listed in zUMIs_output/<Project_name>kept_barcodes.txt

  • "expression" contains .rds files
    • list of count matrices as sparseMatrix (*.dgecounts.rds)
  • "stats" contains plots and data files with descriptive statistics

Optionally, per-cell demultiplexed bam files are found in:

zUMIs_output/demultiplex

Structure of the output dgecounts object in <project_name>.dgecounts.rds

zUMIs produces dge output in .rds format that can be read in R with the following command.

AllCounts <- readRDS("zUMIs_output/expression/example.dgecounts.rds")
names(AllCounts)
[1] "umicount"  "readcount"

names(AllCounts$umicount)
[1] "exon"   "inex"   "intron"

names(AllCounts$umicount$exon)
[1] "all"          "downsampling"

AllCounts is a list of lists with all the count matrices as sparseMatrix. The parent list contains UMI and read count quantification. In each of these counting types, you will find the three feature types (introns,exons and intron+exon). Each of those contain a sparseMatrix generating using all reads observed and a list for the downsampling sizes requested. Each of the downsampling list elements is also a sparseMatrix.

The sparseMatrix can be converted into a conventional count table using "as.matrix" function in R and saved as a text file using the code below.

#Feature exons umicounts table
dge <- as.matrix(AllCounts$umicount$exon$all)
write.table(dge,"exons.umicounts.txt",quote=F,sep="\t")

NOTE: It is highly recommended to retain sparseMatrix format for the downstream analysis, especially when you have >20,000 cells to save time and space.

Loading zUMIs RDS output in Python

To load zUMIs output as a pandas object, please find code below Thanks to @adamgayoso for this!

import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
pandas2ri.activate()
from rpy2.robjects.packages import STAP

def zumis_output2pandas(filename):
    mfunc = 'to_df <- function(dobj){return(as.data.frame(as.matrix(dobj)))}'
    rsparse2pandas = STAP(mfunc, "to_df")

    readRDS = robjects.r['readRDS']
    zumis_data = readRDS(filename)
    zd = dict(zip(zumis_data.names, list(zumis_data)))
    for key in ['umicount', 'readcount']:
        zd[key] = dict(zip(zd[key].names, list(zd[key])))
        zd[key]['exon'] = dict(zip(zd[key]['exon'].names, list(zd[key]['exon'])))
        zd[key]['exon']['all'] = rsparse2pandas.to_df(zd[key]['exon']['all']).transpose()
        zd[key]['exon']['downsampling'] = dict(zip(zd[key]['exon']['downsampling'].names, list(zd[key]['exon']['downsampling'])))
        for k in zd[key]['exon']['downsampling'].keys():
            zd[key]['exon']['downsampling'][k] = rsparse2pandas.to_df(zd[key]['exon']['downsampling'][k]).transpose()
    return zd
Clone this wiki locally