Skip to content

Commit

Permalink
Update README.rst (#287)
Browse files Browse the repository at this point in the history
* Update README.rst

* Update README.rst (#288)

* Update README.rst

* Update description

---------

Co-authored-by: Yerdos Ordabayev <[email protected]>
  • Loading branch information
mbabadi and ordabayevy authored Jan 17, 2025
1 parent a854767 commit 090161e
Showing 1 changed file with 99 additions and 43 deletions.
142 changes: 99 additions & 43 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,67 +1,123 @@
*Cellarium ML: distributed single-cell data analysis.*
.. image:: https://cellarium.ai/wp-content/uploads/2024/07/cellarium-logo-medium.png
:alt: Cellarium Logo
:width: 180
:align: center

---------
**Cellarium ML: a machine learning framework for single-cell biology**
======================================================================

Cellarium ML is a PyTorch Lightning-based library for distributed single-cell data analysis.
It provides a set of tools for training deep learning models on large-scale single-cell datasets,
including distributed data loading, model training, and evaluation. Cellarium ML is designed to be
modular and extensible, allowing users to easily define custom models, data transformations,
It provides tools for training deep learning models on large-scale single-cell datasets,
including distributed data loading, model training, and evaluation. Designed to be modular
and extensible, Cellarium ML allows users to easily define custom models, data transformations,
and training pipelines.

Code organization
-----------------
-------------------------------------------------------------------------------

**Code Organization**
----------------------

The code is organized as follows:

- ``cellarium/ml/callbacks``: Contains custom PyTorch Lightning callbacks.
- ``cellarium/ml/core``: Includes essential Cellarium ML components:
- ``CellariumModule``: A PyTorch Lightning Module tasked with defining and configuring the model, training step, and optimizer.
- ``CellariumAnnDataDataModule``: A PyTorch Lightning DataModule designed for setting up a multi-GPU DataLoader for a collection of AnnData objects.
- ``CellariumPipeline``: A Module List that pipes the input data through a series of transforms and a model.
- ``cellarium/ml/data``: Contains Distributed AnnData Collection and multi-GPU Iterable Dataset implementations.
- ``cellarium/ml/lr_schedulers``: Contains custom learning rate schedulers.
- ``cellarium/ml/models``: Features Cellarium ML models:
- Models must subclass ``CellariumModel`` and implement the ``.reset_parameters`` method.
- The ``.forward`` method should return a dictionary containing the computed loss under the ``loss`` key.
- Optionally, hooks such as ``.on_train_start``, ``.on_train_epoch_end``, and ``.on_train_batch_end`` can be implemented to be triggered by the ``CellariumModule`` during training phases.
- ``cellarium/ml/preprocessing``: Provides pre-processing functions.
- ``cellarium/ml/transforms``: Contains data transformation modules:
- Each transform is a subclass of ``torch.nn.Module``.
- The ``.forward`` method should output a dictionary where the keys correspond to the input arguments of subsequent transforms and the model.
- ``cellarium/ml/utilities``: Contains utility functions for various submodules.
- ``cellarium/ml/cli.py``: Implements the ``cellarium-ml`` CLI. Models must be registered here to be accessible via the CLI.
.. code-block:: text
cellarium/
└── ml/
├── "callbacks" # Custom PyTorch Lightning callbacks
├── "core" # Essential components
│ ├── "CellariumModule" # PyTorch Lightning Module for model, training step, and optimizer
│ ├── "CellariumAnnDataDataModule" # DataModule for multi-GPU DataLoader for AnnData objects
│ └── "CellariumPipeline" # Pipeline for data transformations and model inference
├── "data" # Distributed AnnData Collection and multi-GPU Iterable Datasets
├── "lr_schedulers" # Custom learning rate schedulers
├── "models" # Cellarium ML models
├── "preprocessing" # Pre-processing functions
├── "transforms" # Data transformation modules
├── "utilities" # Utility functions for various submodules
└── "cli.py" # Implements the "cellarium-ml" CLI. Models must be registered here
Important Notes
~~~~~~~~~~~~~~~

``cellarium/ml/models/*``
~~~~~~~~~~~~~~~~~~~~~~~~~

- Models must subclass ``CellariumModel`` and implement the following:
- ``reset_parameters``: Initializes model parameters.
- ``forward``: Returns a dictionary containing the computed loss under the ``loss`` key.

Optional hooks for training include:

- ``on_train_start``: Called at the start of training.
- ``on_train_epoch_end``: Triggered at the end of each epoch.
- ``on_train_batch_end``: Triggered at the end of each batch.

``cellarium/ml/transforms/*``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- All transforms must subclass ``torch.nn.Module``.
- The ``forward`` method must output a dictionary where keys correspond to the input arguments for subsequent transforms or the model.

``cellarium/ml/cli.py``
~~~~~~~~~~~~~~~~~~~~~~~
- Models must be registered here to be accessible via the command-line interface (``cellarium-ml`` CLI).



-------------------------------------------------------------------------------

**Installation**
-----------------

To install via pip:

.. code-block:: bash
pip install cellarium-ml
To install the developer version from source:

.. code-block:: bash
git clone https://github.com/cellarium-ai/cellarium-ml.git
cd cellarium-ml
make install # runs pip install -e .[dev]
Installation
------------
**API Documentation and Tutorials**
-----------------------------------

To install from the pip::
For detailed API documentation and tutorials, visit:
`Cellarium ML Documentation <https://cellarium-ai.github.io/cellarium-ml/>`_

$ pip install cellarium-ml
-------------------------------------------------------------------------------

To install the developer version from the source::
**For Developers**
-------------------

$ git clone https://github.com/cellarium-ai/cellarium-ml.git
$ cd cellarium-ml
$ make install # runs pip install -e .[dev]
To run the tests:

For developers
--------------
.. code-block:: bash
To run the tests::
make test-examples # runs single-device cli example tests
make test-dataloader # runs single-device dataloader related tests
TEST_DEVICES=2 make test-dataloader # runs multi-device dataloader related test
make test # runs single-device (all other) tests
TEST_DEVICES=2 make test # runs multi-device (all other) tests
$ make test # runs single-device tests
$ TEST_DEVICES=2 make test # runs multi-device tests
To format the code automatically:

To automatically format the code::
.. code-block:: bash
$ make format # runs ruff formatter and fixes linter errors
make format # runs ruff formatter and fixes linter errors
To run the linters::
To run the linters:

$ make lint # runs ruff linter and checks for formatter errors
.. code-block:: bash
To build the documentation::
make lint # runs ruff linter and checks for formatter errors
$ make docs # builds the documentation at docs/build/html
To build the documentation:

.. code-block:: bash
make docs # builds the documentation at docs/build/html

0 comments on commit 090161e

Please sign in to comment.