mkdir ~/sratoolkit
cd ~/sratoolkit
tar -vxzf sratoolkit.current-ubuntu64.tar.gz
cd ~/data
~/SRAToolkit/sratoolkit.current-ubuntu64/bin/fastq-dump SRR11268104 -O ./filename/ # 结果生成:SRR11268104.fastq
~/SRAToolkit/sratoolkit.current-ubuntu64/bin/fastq-dump --fasta SRR11268104 -O ./filename/ #(结果生成:SRR11268104.fasta)
cd ~/data
~/SRAToolkit/sratoolkit.current-ubuntu64/bin/fastq-dump SRR11268104 --split-3 -O ./filename/ # 结果生成:SRR11268104_1.fastq,SRR11268104_2.fastq)
~/SRAToolkit/sratoolkit.current-ubuntu64/bin/fastq-dump SRR11268104 --split-3 --gzip -O ./filename/ #(结果生成:SRR11268104_1.fastq.gz, SRR11268104_2.fastq.gz)
如果结果有三个文件,说明是双端文件,但是有的数据质量不高,存在trim的结果,第三个文件的名字一般是:<srr_id>.fastq, 而且文件也不大,基本可以忽略
~/SRAToolkit/sratoolkit.current-ubuntu64/bin/fastq-dump SRR11268104 --split-files --gzip -O ./filename/
如果得到三个文件,即SRR11268104_1.fastq.gz, SRR11268104_2.fastq.gz, SRR11268104_3.fastq.gz,分别对应sample index(这个是加在Illumina测序接头上的,保证多个测序文库可以在同一个flow-cell上或者同一个lane上进行混合测序)、barcode和UMI序列、测序内容(我们所需要的真实测序内容,就是我们的每一个细胞里的RNA测序数据)
如果得到两个文件,即SRR11268104_1.fastq.gz, SRR11268104_2.fastq.gz,那就分别对应barcode和UMI序列、测序内容
for i in $(cat SRR_list.txt) # SRR_list.txt 包含所有SRR序列号
#echo $i
#mv $i\_1.fastq.gz $i\_S1_L001_I1_001.fastq.gz
mv $i\_1.fastq.gz $i\_S1_L001_R1_001.fastq.gz #mv $i\_2.fastq.gz $i\_S1_L001_R1_001.fastq.gz
mv $i\_2.fastq.gz $i\_S1_L001_R2_001.fastq.gz #mv $i\_3.fastq.gz $i\_S1_L001_R2_001.fastq.gz
mkdir ~/data/cellranger_results
cd ~/data/cellranger_results
/Shared_Software/Single_cell/cellranger-4.0.0/bin/cellranger count \
--id Mouse_10x \
--fastqs=~/data/ \
--sample=SRR11268104 \
--localcores=40 \
--localmem=100 \
- --id:output filename
- --fastqs:path to fastq file
- --sample:fastq.gz文件名当中_S1之前的字段
- --transcriptome:参考基因组
By default, reads that are transcriptomic(exon)are carried forward to UMI counting. In certain cases, such as when the input to the assay consists of nuclei, there may be high levels of intronic reads generated by unspliced transcripts. In order to count these intronic reads, the cellranger count and cellranger multi pipelines (v5.0以上) can be run with the option include-introns. If this option is used, any reads that map in the sense orientation to a single gene are carried forward to UMI counting.
The include-introns option eliminates the need for a custom "pre-mRNA" reference that defines the entire gene body to be an exon.
# conda activate velocyto
velocyto run10x -m /Shared_Software/ref_genome/mm10_rmsk/mm10_rmsk.gtf \
~/data/cellranger_results/Mouse_10x \
- Usage: velocyto run10x -m msk.gtf SAMPLEFOLDER GTFFILE
- -m msk.gtf: file containing intervals to mask (for example from UCSC genome browser and make sure to select GTF as output format)
- SAMPLEFOLDER: this is the folder containing the subfolder: outs, outs/analys and outs/filtered_gene_bc_matrices from cellranger output folders.
- GTFFILE: genome annotation file