Skip to content

A comparison of data integration methods for single-cell RNA sequencing of cancer samples

Notifications You must be signed in to change notification settings

pughlab/cancer-scrna-integration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A comparison of data integration methods for single-cell RNA sequencing of cancer samples

This repository contains custom code used for analysis and plotting in Richards LM et al., bioRxiv (2021).

Tumours are routinely profiled with single-cell RNA sequencing (scRNA-seq) to characterize their diverse cellular ecosystems of malignant, immune, and stromal cell types. When combining data from multiple samples or studies, batch-specific technical variation can confound biological signals. However, scRNA-seq batch integration methods are often not designed for, or benchmarked, on datasets containing cancer cells. Here, we compare 5 data integration tools applied to 171,206 cells from 5 tumour scRNA-seq datasets. Based on our results, STACAS and fastMNN are the most suitable methods for integrating tumour datasets, demonstrating robust batch effect correction while preserving relevant biological variability in the malignant compartment. This comparison provides a framework for evaluating how well single-cell integration methods correct for technical variability while preserving biological heterogeneity of malignant and non-malignant cell populations.

Data Availability

The majority of data used during this study was obtained from publicly available sources. All datasets have been made available in their re-processed form through CReSCENT (Study IDs CRES-P24, CRES-P25, CRES-P26, CRES-P27, CRES-P28). For newly generated glioma snRNA-seq data, raw sequencing data in the form of FASTQS will be made available through the European Genome-Phenome Archive upon publication.

Contact Information

Laura Richards ([email protected])
Dr. Trevor Pugh ([email protected])