-
Notifications
You must be signed in to change notification settings - Fork 67
Starting from demultiplexed fastq files
In some cases, sequencing providers will only provide already demultiplexed fastq files. Since zUMIs requires the cell identity to be encoded in one of the fastq files, already demultiplexed files can be incompatible eg. in Smart-seq data.
In zUMIs we provide a way to recombine fastq files and generate an arbitrary index sequence.
All you need to provide is the path to the folder containing the individual fastq files that should be combined.
Fastq file names are expected in the format of bcl2fastq (XYZ_R1_001.fastq.gz
) or SRA's fastq-dump (XYZ_1.fastq.gz
). Fastq files are assumed to be gzipped.
Rscript zUMIs/misc/merge_demultiplexed_fastq.R --dir /path/to/individual_fastqs
optionally, you can also set a custom path to the pigz dependency and a number of threads (defaults are pigz
and 24, respectively)
Rscript zUMIs/misc/merge_demultiplexed_fastq.R --dir /path/to/individual_fastqs --pigz /path/to/pigz --threads 8
The output files will be generated in the same folder:
reads_for_zUMIs.R1.fastq.gz --- concatenated read 1 file
reads_for_zUMIs.R2.fastq.gz --- concatenated read 2 file (if paired-end was detected)
reads_for_zUMIs.index.fastq.gz --- generated barcode reads to be used in zUMIs
reads_for_zUMIs.samples.txt --- text file containing the sample to barcode mapping
reads_for_zUMIs.expected_barcodes.txt --- barcode text list for use in zUMIs YAML
All "barcodes" will be a randomly generated strings of length 8, so in your YAML set up the index fastq file with BC(1-8)
. Independently of whether the original data was indexed as single-index or dual-index, this script will only create a single index fastq file.