-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Update README.rst * Update README.rst (#288) * Update README.rst * Update description --------- Co-authored-by: Yerdos Ordabayev <[email protected]>
- Loading branch information
1 parent
a854767
commit 090161e
Showing
1 changed file
with
99 additions
and
43 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,67 +1,123 @@ | ||
*Cellarium ML: distributed single-cell data analysis.* | ||
.. image:: https://cellarium.ai/wp-content/uploads/2024/07/cellarium-logo-medium.png | ||
:alt: Cellarium Logo | ||
:width: 180 | ||
:align: center | ||
|
||
--------- | ||
**Cellarium ML: a machine learning framework for single-cell biology** | ||
====================================================================== | ||
|
||
Cellarium ML is a PyTorch Lightning-based library for distributed single-cell data analysis. | ||
It provides a set of tools for training deep learning models on large-scale single-cell datasets, | ||
including distributed data loading, model training, and evaluation. Cellarium ML is designed to be | ||
modular and extensible, allowing users to easily define custom models, data transformations, | ||
It provides tools for training deep learning models on large-scale single-cell datasets, | ||
including distributed data loading, model training, and evaluation. Designed to be modular | ||
and extensible, Cellarium ML allows users to easily define custom models, data transformations, | ||
and training pipelines. | ||
|
||
Code organization | ||
----------------- | ||
------------------------------------------------------------------------------- | ||
|
||
**Code Organization** | ||
---------------------- | ||
|
||
The code is organized as follows: | ||
|
||
- ``cellarium/ml/callbacks``: Contains custom PyTorch Lightning callbacks. | ||
- ``cellarium/ml/core``: Includes essential Cellarium ML components: | ||
- ``CellariumModule``: A PyTorch Lightning Module tasked with defining and configuring the model, training step, and optimizer. | ||
- ``CellariumAnnDataDataModule``: A PyTorch Lightning DataModule designed for setting up a multi-GPU DataLoader for a collection of AnnData objects. | ||
- ``CellariumPipeline``: A Module List that pipes the input data through a series of transforms and a model. | ||
- ``cellarium/ml/data``: Contains Distributed AnnData Collection and multi-GPU Iterable Dataset implementations. | ||
- ``cellarium/ml/lr_schedulers``: Contains custom learning rate schedulers. | ||
- ``cellarium/ml/models``: Features Cellarium ML models: | ||
- Models must subclass ``CellariumModel`` and implement the ``.reset_parameters`` method. | ||
- The ``.forward`` method should return a dictionary containing the computed loss under the ``loss`` key. | ||
- Optionally, hooks such as ``.on_train_start``, ``.on_train_epoch_end``, and ``.on_train_batch_end`` can be implemented to be triggered by the ``CellariumModule`` during training phases. | ||
- ``cellarium/ml/preprocessing``: Provides pre-processing functions. | ||
- ``cellarium/ml/transforms``: Contains data transformation modules: | ||
- Each transform is a subclass of ``torch.nn.Module``. | ||
- The ``.forward`` method should output a dictionary where the keys correspond to the input arguments of subsequent transforms and the model. | ||
- ``cellarium/ml/utilities``: Contains utility functions for various submodules. | ||
- ``cellarium/ml/cli.py``: Implements the ``cellarium-ml`` CLI. Models must be registered here to be accessible via the CLI. | ||
.. code-block:: text | ||
cellarium/ | ||
└── ml/ | ||
├── "callbacks" # Custom PyTorch Lightning callbacks | ||
├── "core" # Essential components | ||
│ ├── "CellariumModule" # PyTorch Lightning Module for model, training step, and optimizer | ||
│ ├── "CellariumAnnDataDataModule" # DataModule for multi-GPU DataLoader for AnnData objects | ||
│ └── "CellariumPipeline" # Pipeline for data transformations and model inference | ||
├── "data" # Distributed AnnData Collection and multi-GPU Iterable Datasets | ||
├── "lr_schedulers" # Custom learning rate schedulers | ||
├── "models" # Cellarium ML models | ||
├── "preprocessing" # Pre-processing functions | ||
├── "transforms" # Data transformation modules | ||
├── "utilities" # Utility functions for various submodules | ||
└── "cli.py" # Implements the "cellarium-ml" CLI. Models must be registered here | ||
Important Notes | ||
~~~~~~~~~~~~~~~ | ||
|
||
``cellarium/ml/models/*`` | ||
~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
- Models must subclass ``CellariumModel`` and implement the following: | ||
- ``reset_parameters``: Initializes model parameters. | ||
- ``forward``: Returns a dictionary containing the computed loss under the ``loss`` key. | ||
|
||
Optional hooks for training include: | ||
|
||
- ``on_train_start``: Called at the start of training. | ||
- ``on_train_epoch_end``: Triggered at the end of each epoch. | ||
- ``on_train_batch_end``: Triggered at the end of each batch. | ||
|
||
``cellarium/ml/transforms/*`` | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
- All transforms must subclass ``torch.nn.Module``. | ||
- The ``forward`` method must output a dictionary where keys correspond to the input arguments for subsequent transforms or the model. | ||
|
||
``cellarium/ml/cli.py`` | ||
~~~~~~~~~~~~~~~~~~~~~~~ | ||
- Models must be registered here to be accessible via the command-line interface (``cellarium-ml`` CLI). | ||
|
||
|
||
|
||
------------------------------------------------------------------------------- | ||
|
||
**Installation** | ||
----------------- | ||
|
||
To install via pip: | ||
|
||
.. code-block:: bash | ||
pip install cellarium-ml | ||
To install the developer version from source: | ||
|
||
.. code-block:: bash | ||
git clone https://github.com/cellarium-ai/cellarium-ml.git | ||
cd cellarium-ml | ||
make install # runs pip install -e .[dev] | ||
Installation | ||
------------ | ||
**API Documentation and Tutorials** | ||
----------------------------------- | ||
|
||
To install from the pip:: | ||
For detailed API documentation and tutorials, visit: | ||
`Cellarium ML Documentation <https://cellarium-ai.github.io/cellarium-ml/>`_ | ||
|
||
$ pip install cellarium-ml | ||
------------------------------------------------------------------------------- | ||
|
||
To install the developer version from the source:: | ||
**For Developers** | ||
------------------- | ||
|
||
$ git clone https://github.com/cellarium-ai/cellarium-ml.git | ||
$ cd cellarium-ml | ||
$ make install # runs pip install -e .[dev] | ||
To run the tests: | ||
|
||
For developers | ||
-------------- | ||
.. code-block:: bash | ||
To run the tests:: | ||
make test-examples # runs single-device cli example tests | ||
make test-dataloader # runs single-device dataloader related tests | ||
TEST_DEVICES=2 make test-dataloader # runs multi-device dataloader related test | ||
make test # runs single-device (all other) tests | ||
TEST_DEVICES=2 make test # runs multi-device (all other) tests | ||
$ make test # runs single-device tests | ||
$ TEST_DEVICES=2 make test # runs multi-device tests | ||
To format the code automatically: | ||
|
||
To automatically format the code:: | ||
.. code-block:: bash | ||
$ make format # runs ruff formatter and fixes linter errors | ||
make format # runs ruff formatter and fixes linter errors | ||
To run the linters:: | ||
To run the linters: | ||
|
||
$ make lint # runs ruff linter and checks for formatter errors | ||
.. code-block:: bash | ||
To build the documentation:: | ||
make lint # runs ruff linter and checks for formatter errors | ||
$ make docs # builds the documentation at docs/build/html | ||
To build the documentation: | ||
|
||
.. code-block:: bash | ||
make docs # builds the documentation at docs/build/html |