Skip to content

Commit

Permalink
Vb/readmendocs (#101)
Browse files Browse the repository at this point in the history
updated README and tutorials
  • Loading branch information
vineetbansal authored Jan 10, 2025
1 parent b6a6020 commit b1f3f84
Show file tree
Hide file tree
Showing 12 changed files with 243 additions and 412 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ repos:
rev: v3.2.0
hooks:
- id: trailing-whitespace
exclude: 'tests/test_data/.*'
exclude: 'tests/test_data/.*|README.md'

- id: end-of-file-fixer
exclude: 'tests/test_data/.*'
Expand Down
198 changes: 57 additions & 141 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,170 +2,86 @@
[![Coverage Status](https://coveralls.io/repos/github/raphael-group/paste3/badge.svg?branch=main)](https://coveralls.io/github/raphael-group/paste3?branch=main)
[![Docs](https://github.com/raphael-group/paste3/actions/workflows/docs.yml/badge.svg)](https://raphael-group.github.io/paste3/)

(Note: This repository integrates Paste and Paste 2, and is a work in progress)
# PASTE
https://github.com/user-attachments/assets/977c05c0-4c45-4d21-9302-dfe23800937e

![PASTE Overview](https://github.com/raphael-group/paste/blob/main/docs/source/_static/images/paste_overview.png)

PASTE is a computational method that leverages both gene expression similarity and spatial distances between spots to align and integrate spatial transcriptomics data. In particular, there are two methods:
1. `pairwise_align`: align spots across pairwise slices.
2. `center_align`: integrate multiple slices into one center slice.

You can read full paper [here](https://www.nature.com/articles/s41592-022-01459-6).

Auto-generated documentation for this package is available [here](https://raphael-group.github.io/paste3/).

Additional examples and the code to reproduce the paper's analyses can be found [here](https://github.com/raphael-group/paste_reproducibility). Preprocessed datasets used in the paper can be found on [zenodo](https://doi.org/10.5281/zenodo.6334774).

### Recent News

* PASTE is now published in [Nature Methods](https://www.nature.com/articles/s41592-022-01459-6)!

* The code to reproduce the analisys can be found [here](https://github.com/raphael-group/paste_reproducibility).

* As of version 1.2.0, PASTE now supports GPU implementation via Pytorch. For more details, see the GPU section of the [Tutorial notebook](docs/source/notebooks/getting-started.ipynb).

### Installation

The easiest way is to install PASTE on pypi: https://pypi.org/project/paste-bio/.

`pip install paste-bio`

Or you can install PASTE on bioconda: https://anaconda.org/bioconda/paste-bio.
# Paste 3

`conda install -c bioconda paste-bio`
**Paste 3** (Paste + Paste 2) is a Python package and NAPARI plugin that
provides advanced alignment methods of Spatial Transcriptonomics (ST) data
as detailed in the following publications:

Check out Tutorial.ipynb for an example of how to use PASTE.
### 1. *PASTE*
**Zeira, R., Land, M., Strzalkowski, A., et al.**
*Alignment and integration of spatial transcriptomics data.*
**Nat Methods**, 19, 567–575 (2022).

Alternatively, you can clone the respository and try the following example in a
notebook or the command line.
[Read the publication](https://doi.org/10.1038/s41592-022-01459-6)
[Original PASTE code](https://github.com/raphael-group/paste)

### Quick Start
---

To use PASTE we require at least two slices of spatial-omics data (both
expression and coordinates) that are in
anndata format (i.e. read in by scanpy/squidpy). We have included a breast
cancer dataset from [1] in the [sample_data folder](tests/data/input/) of this repo
that we will use as an example below to show how to use PASTE.
### 2. *PASTE2*
**Liu X, Zeira R, Raphael BJ.**
*Partial alignment of multislice spatially resolved transcriptomics data.*
**Genome Res.** 2023 Jul; 33(7):1124-1132.
[Read the publication](https://doi.org/10.1101/gr.277670.123)
[Original PASTE2 code](https://github.com/raphael-group/paste2)

```python
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import scanpy as sc
import paste as pst
The motivation behind PASTE3 is to provide a NAPARI plugin
for practitioners to experiment with both PASTE and PASTE2 at an operational
level, as well as provide a common codebase for future development of ST
alignment algorithms. (`Paste-N`..)

# Load Slices
data_dir = 'tests/data/input/' # change this path to the data you wish to analyze
PASTE3 is built on `pytorch` and can leverage a GPU for performance if
available, though it is able to run just fine in the absence of a GPU, on all
major platforms.

Auto-generated documentation for the PASTE3 package is available [here](https://raphael-group.github.io/paste3/).

# Assume that the coordinates of slices are named slice_name + "_coor.csv"
def load_slices(data_dir, slice_names=["slice1", "slice2"]):
slices = []
for slice_name in slice_names:
slice_i = sc.read_csv(data_dir + slice_name + ".csv")
slice_i_coor = np.genfromtxt(data_dir + slice_name + "_coor.csv", delimiter=',')
slice_i.obsm['spatial'] = slice_i_coor
# Preprocess slices
sc.pp.filter_genes(slice_i, min_counts=15)
sc.pp.filter_cells(slice_i, min_counts=100)
slices.append(slice_i)
return slices
Additional examples and the code to reproduce the original PASTE paper's analyses are available [here](https://github.com/raphael-group/paste_reproducibility). Preprocessed datasets used in the paper can be found on [zenodo](https://doi.org/10.5281/zenodo.6334774).

## Overview

slices = load_slices(data_dir)
slice1, slice2 = slices

# Pairwise align the slices
pi12 = pst.pairwise_align(slice1, slice2)

# To visualize the alignment you can stack the slices
# according to the alignment pi
slices, pis = [slice1, slice2], [pi12]
new_slices = pst.stack_slices_pairwise(slices, pis)

slice_colors = ['#e41a1c', '#377eb8']
plt.figure(figsize=(7, 7))
for i in range(len(new_slices)):
pst.plot_slice(new_slices[i], slice_colors[i], s=400)
plt.legend(handles=[mpatches.Patch(color=slice_colors[0], label='1'), mpatches.Patch(color=slice_colors[1], label='2')])
plt.gca().invert_yaxis()
plt.axis('off')
plt.show()

# Center align slices
## We have to reload the slices as pairwise_alignment modifies the slices.
slices = load_slices(data_dir)
slice1, slice2 = slices

# Construct a center slice
## choose one of the slices as the coordinate reference for the center slice,
## i.e. the center slice will have the same number of spots as this slice and
## the same coordinates.
initial_slice = slice1.copy()
slices = [slice1, slice2]
lmbda = len(slices) * [1 / len(slices)] # set hyperparameter to be uniform

## Possible to pass in an initial pi (as keyword argument pis_init)
## to improve performance, see Tutorial.ipynb notebook for more details.
center_slice, pis = pst.center_align(initial_slice, slices, lmbda)

## The low dimensional representation of our center slice is held
## in the matrices W and H, which can be used for downstream analyses
W = center_slice.uns['paste_W']
H = center_slice.uns['paste_H']
```

### GPU implementation
PASTE now is compatible with gpu via Pytorch. All we need to do is add the following two parameters to our main functions:
```
pi12 = pst.pairwise_align(slice1, slice2, backend = ot.backend.TorchBackend(), use_gpu = True)
center_slice, pis = pst.center_align(initial_slice, slices, lmbda, backend = ot.backend.TorchBackend(), use_gpu = True)
```
For more details, see the GPU section of the [Tutorial notebook](docs/source/notebooks/getting-started.ipynb).

### Command Line

We provide the option of running PASTE from the command line.
![PASTE Overview](https://github.com/raphael-group/paste/blob/main/docs/source/_static/images/paste_overview.png)

First, clone the repository:
The PASTE series of algorithms provide computational methods that leverage both
gene expression similarity and spatial distances between spots to align and
integrate spatial transcriptomics data. In particular, there are two modes of
operation:
1. `Pairwise-Alignment`: align spots between successive pairs of slices.
2. `Center-Alignment`: infer a `center slice` (low sparsity, low variance) and
align all slices with respect to this center slice.

`git clone https://github.com/raphael-group/paste.git`

Next, when providing files, you will need to provide two separate files: the gene expression data followed by spatial data (both as .csv) for the code to initialize one slice object.
### Installation

Sample execution (based on this repo): `python paste-cmd-line.py -m center -f ./sample_data/slice1.csv ./sample_data/slice1_coor.csv ./sample_data/slice2.csv ./sample_data/slice2_coor.csv ./sample_data/slice3.csv ./sample_data/slice3_coor.csv`
The easiest way is to install PASTE3 is using `pip`:

Note: `pairwise` will return pairwise alignment between each consecutive pair of slices (e.g. \[slice1,slice2\], \[slice2,slice3\]).
`pip install git+https://github.com/raphael-group/paste3.git`

| Flag | Name | Description | Default Value |
| --- | --- | --- | --- |
| -m | mode | Select either `pairwise` or `center` | (str) `pairwise` |
| -f | files | Path to data files (.csv) | None |
| -d | direc | Directory to store output files | Current Directory |
| -a | alpha | Alpha parameter for PASTE | (float) `0.1` |
| -c | cost | Expression dissimilarity cost (`kl` or `Euclidean`) | (str) `kl` |
| -p | n_components | n_components for NMF step in `center_align` | (int) `15` |
| -l | lmbda | Lambda parameter in `center_align` | (floats) probability vector of length `n` |
| -i | intial_slice | Specify which file is also the intial slice in `center_align` | (int) `1` |
| -t | threshold | Convergence threshold for `center_align` | (float) `0.001` |
| -x | coordinates | Output new coordinates (toggle to turn on) | `False` |
| -w | weights | Weights files of spots in each slice (.csv) | None |
| -s | start | Initial alignments for OT. If not given uses uniform (.csv structure similar to alignment output) | None |
Developers who wish to work with `paste3` in Python will likely want to review
the detailed [installation](https://raphael-group.github.io/paste3/installation)
page.

`pairwise_align` outputs a (.csv) file containing mapping of spots between each consecutive pair of slices. The rows correspond to spots of the first slice, and cols the second.

`center_align` outputs two files containing the low dimensional representation (NMF decomposition) of the center slice gene expression, and files containing a mapping of spots between the center slice (rows) to each input slice (cols).
### Getting Started

### Sample Dataset
If you intend to use PASTE3 as a `napari` plugin, install `paste3` in a python
environment that has `napari` installed, or install `napari` after having
installed `paste3` as above.

Added sample spatial transcriptomics dataset consisting of four breast cancer slice courtesy of:
`pip install napari`

[1] Ståhl, Patrik & Salmén, Fredrik & Vickovic, Sanja & Lundmark, Anna & Fernandez Navarro, Jose & Magnusson, Jens & Giacomello, Stefania & Asp, Michaela & Westholm, Jakub & Huss, Mikael & Mollbrink, Annelie & Linnarsson, Sten & Codeluppi, Simone & Borg, Åke & Pontén, Fredrik & Costea, Paul & Sahlén, Pelin Akan & Mulder, Jan & Bergmann, Olaf & Frisén, Jonas. (2016). Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. 353. 78-82. 10.1126/science.aaf2403.
Open one of the sample datasets we provide (`File->Open Sample->Paste3->SCC Patient..`)
and then select one of the two modes of PASTE3 operations
(`Plugins->Paste3->Center Align` or `Plugins->Paste3->Pairwise Align`).

Note: Original data is (.tsv), but we converted it to (.csv).
Your own datasets can be used if they're in the .h5ad format, with each file denoting a single
slice. With the default parameters, alignment should take a couple of minutes, though
you have the option of changing these to suit your needs.

### References
![paste3_napari](https://github.com/user-attachments/assets/41281c31-fe11-443e-ab13-1dec4e01b3b6)

Ron Zeira, Max Land, Alexander Strzalkowski and Benjamin J. Raphael. "Alignment and integration of spatial transcriptomics data". Nature Methods (2022). https://doi.org/10.1038/s41592-022-01459-6
If you intend to use PASTE3 programmatically in your Python code, follow along
the [Getting Started](https://raphael-group.github.io/paste3/notebooks/paste_tutorial.html)
tutorial.
28 changes: 0 additions & 28 deletions docs/paste3/installation.md

This file was deleted.

38 changes: 17 additions & 21 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
API
===
import paste3

.. automodule:: paste3

PASTE Alignment
Alignment
~~~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: api

paste.pairwise_align
paste.center_align
paste.center_ot
paste.center_NMF
paste.my_fused_gromov_wasserstein
paste.line_search_partial

Visualization
~~~~~~~~~~~~~
Expand All @@ -21,31 +25,17 @@ Visualization
visualization.stack_slices_pairwise
visualization.stack_slices_center
visualization.plot_slice
visualization.generalized_procrustes_analysis

Model Selection
~~~~~~~~~~~~~~~~

.. autosummary::
:toctree: api

model_selection.create_graph
model_selection.generate_graph_from_labels
model_selection.edge_inconsistency_score
model_selection.calculate_convex_hull_edge_inconsistency
model_selection.plot_edge_curve
model_selection.select_overlap_fraction_plotting

GLMPCA
~~~~~~~

.. autosummary::
:toctree: api

glmpca.ortho
glmpca.mat_binom_dev
glmpca.glmpca_init
glmpca.est_nb_theta
glmpca.glmpca
model_selection.generate_graph
model_selection.convex_hull_edge_inconsistency
model_selection.select_overlap_fraction


Miscellaneous
Expand All @@ -54,5 +44,11 @@ Miscellaneous
.. autosummary::
:toctree: api

helper.filter_for_common_genes
helper.kl_divergence
helper.glmpca_distance
helper.pca_distance
helper.high_umi_gene_distance
helper.norm_and_center_coordinates
helper.get_common_genes
helper.match_spots_using_spatial_heuristic
helper.dissimilarity_metric
7 changes: 4 additions & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
from pathlib import Path

HERE = Path(__file__).parent
sys.path.insert(0, Path.resolve(HERE.parent.parent))
sys.path.insert(0, str(HERE.parent.parent / "src"))
import paste3 # noqa: E402

# Configuration file for the Sphinx documentation builder.
#
Expand All @@ -13,11 +14,11 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = "paste3"
copyright = "2022, Raphael Lab"
copyright = "2024, Raphael Lab"
author = "Ron Zeira, Max Land, Alexander Strzalkowski, Benjamin J. Raphael"

# The full version, including alpha/beta/rc tags
release = "1.2.0"
release = paste3.__version__

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
Expand Down
Loading

0 comments on commit b1f3f84

Please sign in to comment.