Scripts for initial processing of 10x Genomics 5 prime vdj data (scTCR/scBCR)
This repository requires that csvtool
to be installed:
sudo apt-get install csvtool
It also depends on the H5weaver
, circlize
, stringi
, rmarkdown
, and optparse
libraries.
circlize
, stringi
,rmarkdown
and optparse
are available from CRAN, and can be installed in R using:
install.packages("stringi")
install.packages("circlize")
install.packages("rmarkdown")
install.packages("optparse")
H5weaver
is found in the aifimmunology Github repositories. Install with:
Sys.setenv(GITHUB_PAT = "[your_personal_token_here]")
devtools::install_github("aifimmunology/H5weaver")
Cell hashing sample sheet used in merging step contain 4 columns: SampleID, BatchID, HashTag, PoolID
Example:
SampleID,BatchID,HashTag,PoolID
PB02270-02,EXP-00196,HT1,P1
PB02243-02,EXP-00196,HT2,P1
PB01459-02,EXP-00196,HT3,P1
PB01458-02,EXP-00196,HT4,P1
PB01455-02,EXP-00196,HT5,P1
PB01454-02,EXP-00196,HT6,P1
PB01450-02,EXP-00196,HT7,P1
PB01446-02,EXP-00196,HT8,P1
IMM19_692,EXP-00196,HT9,P1
This script will split metric_summary.scv into three summary files that corresponding to gene expression, scTCR, scBCR library. The gene expression summary file can be used directly into tenx-rnaseq-pipeline/run_add_tenx_rna_metadata.R. It will also add two columns (Well_ID, Batch_ID) to the filtered_contig_annotation files.
There are 3 parameters for this script:
-d
: The path to cellrnager Multi output outs/per_sample_outs/*/-b
: Batch ID-w
: Well ID
An example run for a cellranger multi result is:
bash tenx-vdj-pipeline/mulit_output_fomrating.sh \
-d EXP-00196-Multi-R1C1W1/outs/per_sample_outs/EXP-00196-Multi-P1C1W1/ \
-b EXP-00196 \
-w P1C1W1
Output examples:
It should add three sumary files under EXP-00196-Multi-R1C1W1/outs/per_sample_outs/EXP-00196-Multi-P1C1W1
- EXP-00196-Multi-P1C1W1/outs/per_sample_outs/EXP-00196-Multi-P1C1W1/EXP-00196-P1C1W1_VDJ_T_summary.csv
- EXP-00196-Multi-P1C1W1/outs/per_sample_outs/EXP-00196-Multi-P1C1W1/EXP-00196-P1C1W1_VDJ_B_summary.csv
- EXP-00196-Multi-P1C1W1/outs/per_sample_outs/EXP-00196-Multi-P1C1W1/EXP-00196-P1C1W1_Gene_Expression_summary.csv
It will also add reformated contig csv files in both EXP-00196-Multi-P1C1W1/outs/per_sample_outs/EXP-00196-Multi-P1C1W1/vdj_b and EXP-00196-Multi-P1C1W1/outs/per_sample_outs/EXP-00196-Mutli-P1C1W1/vdj_t folder
-
EXP-00196-Multi-P1C1W1/outs/per_sample_outs/EXP-00196-Multi-P1C1W1/vdj_b/EXP-00196-P1C1W1_Filtered_Contig_Reformated.csv
-
EXP-00196-Multi-P1C1W1/outs/per_sample_outs/EXP-00196-Multi-P1C1W1/vdj_t/EXP-00196-P1C1W1_Filtered_Contig_Reformated.csv
This script will split the Filtered_Contig_Reformated.csv files based on cell hashing result. HTO files comes from cell hashing pipeline output. Notes: You need to do splitting separately for scTCR and scBCR.
There are 4 parameters for this script:
-c
: Input HTO Category-i
: Input Reformated Contig File-o
: Output Directory-w
: Well ID
An example run for a split contig step
bash tenx-vdj-pipeline/split_contig_by_hash.sh \
-c EXP-00196-P1C1W1_hto_category_table.csv.gz \
-i EXP-00196-Multi-R1C1W1/outs/per_sample_outs/EXP-00196-MuLti-P1C1W1/vdj_b/EXP-00196-P1C1W1_Filtered_Contig_Reformated.csv \
-w P1C1W1 \
-o split_contig_scbcr
The output should be the splitted contig files by hash for each well. The file name start with sample name followed by well name.
Output examples:
- IMM19_692_P1C1W1_filtered_contig.csv
- PB01446-02_P1C1W1_filtered_contig.csv
- PB01450-02_P1C1W1_filtered_contig.csv
- PB01454-02_P1C1W1_filtered_contig.csv
- PB01455-02_P1C1W1_filtered_contig.csv
- PB01458-02_P1C1W1_filtered_contig.csv
- PB01459-02_P1C1W1_filtered_contig.csv
- PB02243-02_P1C1W1_filtered_contig.csv
- PB02270-02_P1C1W1_filtered_contig.csv
- multiplet_P1C1W1_filtered_contig.csv
This script will merge contig in the folder of splited contig result. It will detect files with same sample name, and combined them together. Notes: You need to do merging separately for scTCR and scBCR.
There are 3 parameters for this script:
-i
: Input Directory from Splitting Step-k
: Input Cell Hashing Sheet-o
: Output Directory
An example run for merge contig step
bash tenx-vdj-pipeline/merge_contig_by_hash.sh \
-i split_contig_scbcr \
-k exp-0196-cellhashing_sheet.csv \
-o merged_contig_scbcr
Output examples:
- EXP-00196-P1_IMM19_692_filtered_contig.csv
- EXP-00196-P1_PB01446-02_filtered_contig.csv
- EXP-00196-P1_PB01450-02_filtered_contig.csv
- EXP-00196-P1_PB01454-02_filtered_contig.csv
- EXP-00196-P1_PB01455-02_filtered_contig.csv
- EXP-00196-P1_PB01458-02_filtered_contig.csv
- EXP-00196-P1_PB01459-02_filtered_contig.csv
- EXP-00196-P1_PB02243-02_filtered_contig.csv
- EXP-00196-P1_PB02270-02_filtered_contig.csv
- EXP-00196-P1_multiplet_filtered_contig.csv
This script will add contig data into the h5 meta data. scBCR and scTCR need to be added subsequently. Meanwhile, this script will also replace the origional barcodes in filtered contig with cell uuid we added in h5 data.
There are 6 parameters for this script:
-i
: Input sample h5 files-c
: Input sample filtered contig files-d
: Output Directory-b
: Batch ID-t
: In category: "scTCR" or "scBCR"-o
: Output HTML run summary file
An example run for add contig to meta data
Rscript --vanilla tenx-vdj-pipeline/add_contig_to_h5_metadata.R \
-i /home/jupyter/cell_hash/merged_h5/PB01446-02.h5 \
-c /home/jupyter/merged_contig_bcr/EXP-00196-P1_PB01446-02_filtered_contig.csv \
-d /home/jupyter/Add_contig_outputs/ \
-b EXP-00196 \
-t scBCR \
-o EXP-00196-P1_PB01446-02_scBCR_run_summary.html
Output examples:
- PB01446-02.h5
- EXP-00196-P1_PB01446-02_scBCR_run_summary.html
- PB01450-02_filtered_contig_scBCR.csv