Implement contrastive learning model and transforms #195

bricewang · 2024-05-31T06:50:33Z

Adds contrastive multilayer perceptron and NT cross-entropy loss

Adds Duplicate, BinomialResample, Dropout, GaussianNoise transforms

Adds contrastive_mlp to CLI

ordabayevy

Looks great! Left my initial comments. Can you also add the test in test_cli::test_cpu_multi_device and a loading from a checkpoint test similar to one in https://github.com/cellarium-ai/cellarium-ml/blob/main/tests/test_geneformer.py

ordabayevy · 2024-06-06T20:30:42Z

cellarium/ml/cli.py

+        "cellarium.ml.models.ContrastiveMLP",
+        link_arguments=[
+            LinkArguments("data", "model.model.init_args.g_genes", compute_n_vars),
+            LinkArguments("trainer.devices", "model.model.init_args.world_size", None, "parse"),


I would recommend instead of making the world_size a model parameter compute it dynamically in the forward method like here: https://github.com/cellarium-ai/cellarium-ml/blob/main/cellarium/ml/models/onepass_mean_var_std.py#L85

The reason being is that world_size is not a property of the model but of the training procedure. For example, you could train a model with 4 GPUs and use only 1 GPU at inference time, or resume training with 2 GPUs etc.

ordabayevy · 2024-06-06T20:33:53Z

cellarium/ml/cli.py

+        ],
+        trainer_defaults={
+            "max_epochs": 20,
+            "strategy": {"class_path": "lightning.pytorch.strategies.DDPStrategy"},


I think the DDPStrategy default can be removed because it is applied by default when number of devices is greater than 1.

ordabayevy · 2024-06-11T19:20:41Z

cellarium/ml/models/__init__.py

 from cellarium.ml.models.geneformer import Geneformer
 from cellarium.ml.models.incremental_pca import IncrementalPCA
 from cellarium.ml.models.logistic_regression import LogisticRegression
 from cellarium.ml.models.model import CellariumModel, PredictMixin, ValidateMixin
 from cellarium.ml.models.mu_linear import MuLinear, abcdParameter
+from cellarium.ml.models.nt_xent import NT_Xent


Refactor: I suggest having a cellarium.ml.losses folder and add our losses there.

ordabayevy · 2024-06-11T19:25:07Z

cellarium/ml/models/contrastive_mlp.py

+
+    def __init__(
+        self,
+        g_genes: int,


Naming convention: at some point we switched from g_genes naming convention to n_obs. You can also switch it to n_obs to be consistent with other models.

ordabayevy · 2024-06-11T19:30:41Z

cellarium/ml/models/contrastive_mlp.py

+    ):
+        super(ContrastiveMLP, self).__init__()
+
+        layer_list: List[nn.Module] = []


Minor comment: you can create an empty nn.Sequential() and append to it directly:

self.layers = nn.Sequential() self.layers.append(module)

ordabayevy · 2024-06-11T19:36:44Z

cellarium/ml/models/contrastive_mlp.py

+        loss = self.Xent_loss(z1, z2)
+        return {"loss": loss}
+
+    def predict(self, x_ng: torch.Tensor, **kwargs: Any):


Minor comment: **kwargs can be removed, right?

ordabayevy · 2024-06-11T19:44:20Z

cellarium/ml/transforms/binomial_resample.py

+            Binomially resampled gene counts.
+        """
+        p_binom_ng = Uniform(self.p_binom_min, self.p_binom_max).sample(x_ng.shape).type_as(x_ng)
+        p_apply_n = Bernoulli(probs=self.p_apply).sample(x_ng.shape[:1]).type_as(x_ng).bool()


Maybe rename p_apply_n to something else, maybe like apply_mask_n? Because it is not a probability anymore.

Also fixed a bug where x_aug and x_ng were reversed in torch.where

ordabayevy · 2024-06-11T19:46:39Z

cellarium/ml/transforms/binomial_resample.py

+            Upper bound on binomial distribution parameter.
+    """
+
+    def __init__(self, p_binom_min, p_binom_max, p_apply):


Add type hints to __init__.

ordabayevy · 2024-06-11T19:50:02Z

cellarium/ml/transforms/binomial_resample.py

+        Returns:
+            Binomially resampled gene counts.
+        """
+        p_binom_ng = Uniform(self.p_binom_min, self.p_binom_max).sample(x_ng.shape).type_as(x_ng)


Are you using .type_as here to change the device?

Uniform uses the device of the self.p_binom_min. I think if it is of float type then it might be converted to torch.Tensor and be on CPU. Then Uniform.sample will sample on CPU which might be slow. In this case it might be better to convert self.p_binom_min to tensor yourself and cast to GPU device so that sampling step will be faster.

ordabayevy · 2024-06-11T19:53:20Z

cellarium/ml/transforms/randomize.py

+
+class Randomize(nn.Module):
+    """
+    Randomizely applies transform with probability p;


Randomizely -> Randomly?

Alternatively I removed the entire class, as implementation constraints dictated that I determine whether or not to apply transform in the transform itself (i.e. p_apply). In future, this logic could also be pulled out into some inheritable entity.

WIP model init bug Rm pdb trace

…update CLI, remove Randomize

Fix prediction_writer loop

bricewang self-assigned this May 31, 2024

bricewang requested a review from ordabayevy May 31, 2024 06:50

bricewang force-pushed the bw-contrastive-new branch from 0226328 to ede60bb Compare May 31, 2024 15:54

bricewang requested a review from mbabadi May 31, 2024 18:09

ordabayevy requested changes Jun 11, 2024

View reviewed changes

bricewang added 5 commits August 13, 2024 17:31

Implement contrastive learning model

fd9c1dc

WIP model init bug Rm pdb trace

Add documentation and formatting to contrastive model files

86cdfda

Refactor Duplicate transform to simplify model inference

9bf3735

Fix typechecks

9769f7d

Add Mapping import

7e8d3c5

bricewang force-pushed the bw-contrastive-new branch 2 times, most recently from f33bb07 to 8474e25 Compare August 14, 2024 00:21

Refactor NT_Xent, remove world_size param, fix BinomialResample bug, …

39e2134

…update CLI, remove Randomize

bricewang force-pushed the bw-contrastive-new branch from 8474e25 to 39e2134 Compare August 14, 2024 05:47

Refactor transforms, add tests for ContrastiveMLP

f7ed8a5

Fix prediction_writer loop

bricewang force-pushed the bw-contrastive-new branch from 19c69b5 to f7ed8a5 Compare August 30, 2024 22:14

Fix typechecks

10b07bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement contrastive learning model and transforms #195

Implement contrastive learning model and transforms #195

bricewang commented May 31, 2024 •

edited

Loading

ordabayevy left a comment

ordabayevy Jun 6, 2024

ordabayevy Jun 6, 2024

ordabayevy Jun 11, 2024

ordabayevy Jun 11, 2024

ordabayevy Jun 11, 2024

ordabayevy Jun 11, 2024

ordabayevy Jun 11, 2024

bricewang Aug 13, 2024

ordabayevy Jun 11, 2024

ordabayevy Jun 11, 2024

ordabayevy Jun 11, 2024

bricewang Aug 13, 2024

Implement contrastive learning model and transforms #195

Are you sure you want to change the base?

Implement contrastive learning model and transforms #195

Conversation

bricewang commented May 31, 2024 • edited Loading

ordabayevy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bricewang commented May 31, 2024 •

edited

Loading