scCobra: Contrastive cell embedding learning with domain-adaptation for single-cell data integration and harmonization
The rapid development of single-cell technologies underscores the need for more effective methods to integrate and harmonize single-cell sequencing data. The technical and biological variations across studies demand accurate and reliable solutions for data integration. Conventional tools often face limitations due to reliance on gene expression distribution assumptions and over-correction issues. Here, we introduce scCobra, a deep neural network tool designed to address these challenges. By leveraging a deep generative model that combines a contrastive neural network with domain adaptation, scCobra mitigates batch effects and minimizes over-correction without gene expression distribution assumptions. Additionally, scCobra enables online label transfer across datasets with batch effects, allowing continuous integration of new data without retraining, and offers batch effect simulation and advanced multi-omic batch integration. These capabilities make scCobra a significant advancement for integrating datasets with batch effects, enabling comprehensive biological examination of the integrated data. Please refer to our manuscript for details.
Step 1: Create a conda environment for scCobra
# Recommend you to use python above 3.10
conda create -n scCobra conda-forge::python=3.10 conda-forge::ipykernel
# Install scanpy scib
pip install scanpy scib
# You can install addtional packages: https://scib.readthedocs.io/en/latest/index.html
# Install pytorch
pip3 install torch torchvision torchaudio
Step 2: Clone This Repo
git clone https://github.com/mcgilldinglab/scCobra.git
You can click the dataset name to download
- simulated dataset contains 12097 cells, has 7 cell types from 6 batches
- pancreas dataset contains 16382 cells, has 14 cell types from 9 batches
- Immune dataset contains 33506 cells, has 16 cell types from 10 batches
- Lung atlas dataset contains 32472 cells, has 17 cell types from 16 batches
https://sccobra.readthedocs.io/
scCobra is jointly developed by Bowen Zhao and Yi Xiong from Shanghai Jiaotong University and Jun Ding from McGill University.