Skip to content

scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution.

License

Notifications You must be signed in to change notification settings

gagneurlab/scooby

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scooby

image

Documentation Status

Code for the scooby manuscript. Scooby is the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome at single-cell resolution. For this, it leverages the pre-trained multi-omics profile predictor Borzoi as a foundation model, equips it with a cell-specific decoder, and fine-tunes its sequence embeddings. Specifically, the decoder is conditioned on the cell position in a precomputed single-cell embedding.

This repository contains model and data loading code and a train script. The reproducibility repository contains notebooks to reproduce the results of the manuscript.

Hardware requirements

  • NVIDIA GPU (tested on A40), Linux, Python (tested with v3.9)

Installation instructions

Prerequisites

scooby uses a a custom version of SnapATAC2, which can be installed with pip. This is best installed in a separate environment due to numpy version conflicts with scooby.

  • pip install snapatac2-scooby

Scooby package installation

  • pip install git+https://github.com/gagneurlab/scooby.git
  • Download file contents from the Zenodo repo
  • Use examples from the scooby reproducibility repository

Training

We offer a train script for modeling scRNA-seq only and a script for multiome modeling. Both require SNAPATAC2-preprocessed anndatas and embeddings. Training scooby takes 1-2 days on 8 NVIDIA A40 GPUs with 128GB RAM and 32 cores.

Model architecture

Currently, the model is only tested with a batch size of 1.

image

About

scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages