LatentVelo

Estimating RNA velocity in a learned latent space, enabling batch correction and dynamics based embeddings.

Pre-print available at https://www.biorxiv.org/content/10.1101/2022.08.22.504858v2

Reproducing results

The paper_notebooks/ directory runs LatentVelo on all of the datasets used in the paper. The examples/ directory shows a documented example of LatentVelo on synthetic data.

Benchmarking plots are generated by the notebooks in the benchmarks/ directory. Subdirectories of this directory contain code to run benchmarking with synthetic data or batch correction. Additionally the code used to run scVelo is here as well.

Using LatentVelo

Additional settings are available in DOCUMENTATION.md. Information about acquiring the datasets used is in DATASETS.md

Currently not avaiable on pip, just download the repo and install with

python setup.py install

in the main directory.

Setting up data

LatentVelo uses AnnData annotated data objects. This object must have two layers containing spliced and unspliced counts.

Data is prepared for use with LatentVelo as follows:

ltv.utils.standard_clean_recipe(adata, spliced_key='spliced', unspliced_key='unspliced',
                                batch_key='batch', celltype_key='celltype')

Batch and celltype keys for the standard model are optional. For the celltype-annotated model, the following function is used to prepare data and must include a celltype key:

ltv.utils.anvi_clean_recipe(adata, spliced_key='spliced', unspliced_key='unspliced',
                                batch_key='batch', celltype_key='celltype')

Initializing the model

The LatentVelo model can be initialized as a standard VAE or a celltype annotated VAE:

model = ltv.models.VAE(observed = number_of_genes, latent_dim = latent_dimension,
                       zr_dim = latent_regulation_dimension,
					   h_dim = conditioning_dimension)

model = ltv.models.AnnotVAE(observed = number_of_genes, latent_dim = latent_dimension,
                       zr_dim = latent_regulation_dimension,
					   h_dim = conditioning_dimension,
					   celltypes = number_of_celltypes)

Batch correction is enabled by specifying batch correction and the number of batches for either model:

model = ltv.models.VAE(observed = number_of_genes, latent_dim = latent_dimension,
                       zr_dim = latent_regulation_dimension,
					   h_dim = conditioning_dimension,
					   batch_correction = True,
					   batches = number_of_batches)

Training the model

The model is trained with the following function, and validation set autoencoder and trakectory reconstruction losses are output:

epochs, val_ae, val_traj = ltv.train(model, adata, batch_size = batch_size,
                                      epochs=number_of_epochs,
									  name=parameter_output_folder_name)

Outputting results

The following function is used to output the results of LatentVelo to a new AnnData object containing the results on the LatentVelo latent space. If desired, gene velocities can also be included. Model reconstructions using the decoder for both the autoencoder and trajectories can also be included.

latent_adata, adata = ltv.output_results(model, adata,
                                         gene_velocity = True,
										 decoded = True,
										 embedding='umap')

scVelo can then be used to plot 2D velocity streamlines:

scv.tl.velocity_graph(latent_adata, vkey='spliced_velocity')
scv.pl.velocity_embedding_stream(latent_adata, vkey='spliced_velocity',
                                 color='latent_time')

To output cell trajectories:

z_traj, times = ltv.cell_trajectories(model, adata)

These can then be plotted on the latent space UMAP plot.

Package versions

LatentVelo was run with the packages

torchdiffeq 0.2.2
pytorch 1.11.0
seaborn 0.11.2
scvi-tools 0.15.0
scvelo 0.2.4
scipy 1.8.1
sklearn 1.1.1
scanpy 1.9.1
scgen 2.1.0
pandas 1.4.2
numpy 1.22.4
anndata 0.8.0
unitvelo 0.1.5
scib 1.0.3
matplotlib 3.5.2

Citation

If you find this useful please cite

@article {Farrell2022.08.22.504858,
	author = {Farrell, Spencer and Mani, Madhav and Goyal, Sidhartha},
	title = {Inferring single-cell transcriptomic dynamics with structured latent gene expression dynamics},
	elocation-id = {2022.08.22.504858},
	year = {2022},
	doi = {10.1101/2022.08.22.504858},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2022/12/01/2022.08.22.504858},
	eprint = {https://www.biorxiv.org/content/early/2022/12/01/2022.08.22.504858.full.pdf},
	journal = {bioRxiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Dynamo comparison		Dynamo comparison
Somitogenisis		Somitogenisis
additional_tests		additional_tests
benchmark		benchmark
diagrams		diagrams
examples		examples
latentvelo		latentvelo
paper_notebooks		paper_notebooks
synthetic_datasets		synthetic_datasets
DATASETS.md		DATASETS.md
DOCUMENTATION.md		DOCUMENTATION.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LatentVelo

Reproducing results

Using LatentVelo

Setting up data

Initializing the model

Training the model

Outputting results

Package versions

Citation

About

Releases

Packages

Languages

License

AlePur/LatentVelo

Folders and files

Latest commit

History

Repository files navigation

LatentVelo

Reproducing results

Using LatentVelo

Setting up data

Initializing the model

Training the model

Outputting results

Package versions

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages