Comparison of High-Throughput Single-Cell RNA Sequencing Data Processing Pipelines
https://www.nextflow.io/docs/latest/getstarted.html#requirements
1、Nextflow can be used on any POSIX compatible system (Linux, OS X, etc). It requires Bash 3.2 (or later) and Java 8 (or later, up to 11) to be installed.
$ sudo apt-get install openjdk-8-jdk
2、$ mkdir Nextflow && cd Nextflow
3、$ wget -qO- https://get.nextflow.io | bash
It will create the nextflow main executable file in the current directory.
4、move the nextflow file to a directory accessible by your $PATH variable (this is only required to avoid remembering and typing the full path to nextflow each time you need to run it).
5、Test Nextflow
$ nextflow run hello
Output the following to indicate that the installation was successful
N E X T F L O W ~ version 19.04.1
Pulling nextflow-io/hello ...
downloaded from https://github.com/nextflow-io/hello.git
Launching `nextflow-io/hello` [nice_wozniak] - revision: a9012339ce [master]
[warm up] executor > local
executor > local (4)
[bc/a32c37] process > sayHello [100%] 4 of 4 ✔
Ciao world!
Hola world!
Hello world!
Bonjour world!
Anaconda 2019.10 for Linux Installer Python 3.7 version
https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
Make sure conda is installed and updated:
$ conda --version
$ conda update conda
If this is your first time working with Conda, you may need to edit your configuration paths to ensure Anaconda is invoked when calling conda
$ git clone [email protected]:xmuyulab/scRNAseq_pipelines.git
Configuring the software environment. Provides an execution environment for seven pieces of software, which can be installed as needed
$ conda env create -f scRNAseq_pipelines/envs/UMI-tools.yml
$ conda env create -f scRNAseq_pipelines/envs/umis.yml
$ conda env create -f scRNAseq_pipelines/envs/dropEst.yml
$ conda env create -f scRNAseq_pipelines/envs/zUMIs.yml
$ conda env create -f scRNAseq_pipelines/envs/scPipe.yml
$ conda env create -f scRNAseq_pipelines/envs/Dropseq.yml
$ conda env create -f scRNAseq_pipelines/envs/cellranger.yml
$ wget -c https://github.com/broadinstitute/Drop-seq/releases/download/v2.3.0/Drop-seq_tools-2.3.0.zip
$ gunzip Drop-seq_tools-2.3.0.zip
$ cd Drop-seq_tools-2.3.0
Attention please!
Except file of Drop-seq_alignment.sh
. Replace the following /mnt/data/lmy/TMP_Dropseq
with a temporary large space directory. My working directory is broad_tmpdir_root=/mnt/data/lmy/TMP_Dropseq
.
Please replace /mnt/data/lmy/TMP_Dropseq
to /path/to/tmp
,you need to create this directory
$ mkdir -p /path/to/tmp
3、Modify Drop-seq_alignment.sh
, Add temporary directory -Djava.io.tmpdir=/mnt/data/lmy
, please modify /mnt/data/lmy
to /path/to/javatmp
. And modify STAR argument. Position is located in the follow.
# Stage 2: alignment
$echo_prefix java -Xmx4g -Djava.io.tmpdir=/mnt/data/lmy -jar ${picard_jar} SamToFastq INPUT=${tmpdir}/unaligned_mc_tagged_polyA_filtered.bam \
FASTQ=$tmpdir/unaligned_mc_tagged_polyA_filtered.fastq
files_to_delete="$files_to_delete $tmpdir/unaligned_mc_tagged_polyA_filtered.fastq"
$echo_prefix $star_executable --genomeDir ${genomedir} --outFileNamePrefix ${tmpdir}/star. \
--readFilesIn $tmpdir/unaligned_mc_tagged_polyA_filtered.fastq --runThreadN 4 --limitIObufferSize 1500000000 --limitOutSJcollapsed 10000000
files_to_delete="$files_to_delete ${aligned_sam}"
# Stage 3: sort aligned reads (STAR does not necessarily emit reads in the same order as the input)
$echo_prefix java -Dsamjdk.buffer_size=131072 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx4g -Djava.io.tmpdir=/mnt/data/lmy -jar ${picard_jar} \
SortSam INPUT=${aligned_sam} OUTPUT=${aligned_sorted_bam} SORT_ORDER=queryname TMP_DIR=${tmpdir}
files_to_delete="$files_to_delete ${aligned_sorted_bam}"
# Stage 4: merge and tag aligned reads
$echo_prefix java -Xmx4g -Djava.io.tmpdir=/mnt/data/lmy -jar ${picard_jar} MergeBamAlignment REFERENCE_SEQUENCE=${reference} UNMAPPED_BAM=${tagged_unmapped_bam} \
ALIGNED_BAM=${aligned_sorted_bam} INCLUDE_SECONDARY_ALIGNMENTS=false PAIRED_RUN=false CLIP_ADAPTERS=false \
TMP_DIR=${tmpdir} OUTPUT=$tmpdir/merged.bam
files_to_delete="$files_to_delete $tmpdir/merged.bam"
4、$ cd scRNAseq_pipelines/Drop-seq_tools
5、--help
parameters help the user with this script.
$ nextflow run drop_seqtools.nf --help
6、$ nextflow run drop_seqtools.nf --I [ ] --gtf [ ] --d [ ] --g [ ] --set_cell_number [ ] --fasta [ ] --tmpdir [ ] --O [ ] --javatmp [ ]
1、$ cd scRNAseq_pipelines/umis
2、--help
parameters help the user with this script.
$ nextflow run umis.nf --help
3、Fill in the parameters as follows
$ nextflow run umis.nf --seqdir [ ] --rapmap_index [ ] --set_cell_number [ ] --json [ ] --outdir [ ] --threads [ ]
1、$ cd scRNAseq_pipelines/UMI-tools
2、--help
parameters help the user with this script.
$ nextflow run umitools.nf --help
3、$ nextflow run umitools.nf --seqdir [ ] --gtf [ ] --star_index [ ] --bc_pattern [ ] --set_cell_number [ ] --outdir [ ] --threads [ ]
1、$ cd scRNAseq_pipelines/dropEst
2、--help
parameters help the user with this script.
$ nextflow run dropEst.nf --help
3、$ nextflow run dropEst.nf --seqdir [ ] --star_index [ ] --gtf [ ] --xml [ ] --outdir [ ] --threads [ ]
1、$ cd scRNAseq_pipelines/zUMIS
2、--help
parameters help the user with this script.
$ nextflow run zumis.nf --help
3、$ nextflow run zumis.nf --shell [ ] --yaml [ ] --outdir [ ]
1、$ cd scRNAseq_pipelines/scPipe
2、--help
parameters help the user with this script.
$ nextflow run scpipe.nf --help
3、$ nextflow run scpipe.nf --seqdir [ ] --star_index [ ] --gtf [ ] --PlatForm [ ] --outdir [ ] --threads [ ]
1、Download cellranger
https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/3.0/
$ wget -O cellranger-3.0.2.tar.gz "http://cf.10xgenomics.com/releases/cell-exp/cellranger-3.0.2.tar.gz?Expires=1578322576&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cDovL2NmLjEweGdlbm9taWNzLmNvbS9yZWxlYXNlcy9jZWxsLWV4cC9jZWxscmFuZ2VyLTMuMC4yLnRhci5neiIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTU3ODMyMjU3Nn19fV19&Signature=DWyftPc5rsvMJwA3dCcHlF1ZRUgBpev3x654ICzHWke4pruaXg1D7DGcQds-Tf2-jihIWf3H3GSr9EPs-3K5dkBg9GDH61K~o5y7VVkPNEhnD~-o6NG0zNgJu-NNIh7aQE6pnBcRyMjQOVGu2~~dEt0gwdxW~GOTZe3OkJrLf8IRuwL0oEcHf9zEAxsoK8t8n9A~iS8-khzQnMtUJr0aL2Ah37mCdjRl-syfpQR~zR2InTSJkMN5TicBgBQCrfapQ6Y2q6-GAG8bHQM-aCY73d0OjdKwfKg2WFDt6hCJ5bpxCwKzwcGjaXy-pQO5YOwkGDyIX7uOv~bCcWJ7wm-ZwQ__&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA"
2、 Decompression
$ tar -xvf cellranger-3.0.2.tar
3、Add environment variables
vi ~/.bashrc
Append
export PATH="/path/to/cellranger-3.0.2:$PATH"
Save exit edit
source ~/.bashrc
4、$ cd scRNAseq_pipelines/CellRanger
5、--help
parameters help the user with this script.
$ nextflow run cellranger.nf --help
6、$ nextflow run cellranger.nf --I [ ] --O [ ] --r [ ] --set_cell_number [ ]
Mingxuan Gao, Mingyi Ling, Xinwei Tang, Shun Wang, Xu Xiao, Ying Qiao, Wenxian Yang, Rongshan Yu. Comparison of High-Throughput Single-Cell RNA Sequencing Data Processing Pipelines. bioRxiv 2020.02.09.940221; doi: https://doi.org/10.1101/2020.02.09.940221