CellariumGPT #129

ordabayevy · 2024-03-07T16:48:52Z

No description provided.

mbabadi · 2024-03-07T20:01:40Z

cellarium/ml/models/cellarium_gpt.py

+                # Special Scaled Initialization --> There are 2 Layer Norms per Transformer Block
+                p.data.normal_(mean=0.0, std=(self.initializer_range / math.sqrt(2 * self.gpt_model.n_blocks)))
+
+    def tokenize(


documentation

Can you also separately take obs_total_mrna_umis_n (to be concatenated to values_nc) and out_total_mrna_umis_n to generate a token specifying the library size we need the readout to be at? This construct can be used for model training with downsampling. We could use Brice's logic to Duplicate the counts, downsample the first copy to use for observation, and use the second copy for readout.

Another thought is to have multiple tokenize methods. For example:
-- generate_observation_tokens
-- generate_output_tokens
-- generate_register_tokens (later)
-- generate_metadata_tokens (later)

In this construct, generate_observation_tokens takes x_ng, obs_total_mrna_umis_n as input and generates a bunch of tokens. generate_output_tokens, for now, just takes out_total_mrna_umis_n and generates a single token.

All of the tokens are then concatenated and given to an embedding layer. The tokens should also be equipped with metadata to specify how they should be embedded ...

Made an issue #131 about this.

mbabadi · 2024-03-13T02:15:04Z

cellarium/ml/models/cellarium_gpt.py

+        #     - torch.lgamma(self.theta)
+        #     - torch.lgamma(value + 1)
+        # )
+        delta = torch.where(


Add documentation:

"For large "theta", we can use Stirling's asymptotic approximation (see https://en.wikipedia.org/wiki/Gamma_function#Log-gamma_function), which is numerically more stable than PyTorch's implementation of lgamma."

Actually, the condition value / theta < 1e-2 may not be needed. Let's make an issue for me to investiagte.

mbabadi · 2024-03-13T02:23:10Z

cellarium/ml/models/cellarium_gpt.py

+        if (trainer.global_step + 1) % (trainer.log_every_n_steps * 10) != 0:  # type: ignore[attr-defined]
+            return
+
+        import matplotlib.pyplot as plt


I suggest refactoring these plotting functions out to keep on_batch_end decluttered.

mbabadi · 2024-03-13T02:24:48Z

cellarium/ml/cli.py

+@register_model
+def cellarium_gpt(args: ArgsType = None) -> None:
+    r"""
+    CLI to run the :class:`cellarium.ml.models.CellariumGPT` model.


Add example CLI command.

mbabadi

Some small changes here and there.

Generating sample cellarium_gpt.yaml config file

sjfleming · 2024-03-24T18:56:57Z

cellarium/ml/models/cellarium_gpt.py

+        return mu_nc, theta_nc
+
+
+class CellariumGPT(CellariumModel):


Can you make it a PredictMixin and implement predict() ?

Although I'm thinking that, analogous to Geneformer, predict() would return the gene embedding vectors. But I'm not sure that exactly makes sense here... you might also imagine that "predict" would return the negative binomial distributions.

My interest was in having a common sort of interface I could use to extract gene embeddings, whether or not the model was a Geneformer model or a CellariumGPT model. I was previously using .predict() to do this from Geneformer, so I thought it would be nice if CellariumGPT worked the same way.

See #152 if you're interested in this idea

I've been locally implementing different versions of predict method depending on what I wanted to analyze. Since I was changing it a lot decided not to add it here. But it should be something similar to Geneformer predict that returns a dictionary and we can have boolean flags that can be used to control what it returns.

cellarium/ml/models/cellarium_gpt.py

ordabayevy force-pushed the cellarium-gpt branch 5 times, most recently from c82fbeb to 43ed76b Compare March 7, 2024 18:07

ordabayevy added the awaiting review label Mar 7, 2024

ordabayevy requested a review from mbabadi March 7, 2024 18:10

CellariumGPT

7008506

ordabayevy force-pushed the cellarium-gpt branch from 43ed76b to 7008506 Compare March 7, 2024 19:34

mbabadi reviewed Mar 7, 2024

View reviewed changes

mbabadi reviewed Mar 13, 2024

View reviewed changes

mbabadi requested changes Mar 13, 2024

View reviewed changes

mbabadi added 3 commits March 15, 2024 18:45

Refactoring logging methods out of on_epoch_end

b1ecd4f

Generating sample cellarium_gpt.yaml config file

Updated example yaml file

d62b3b8

Cherry-picking README.rst from main

fc8a2d0

sjfleming reviewed Mar 24, 2024

View reviewed changes

cellarium/ml/models/cellarium_gpt.py Outdated Show resolved Hide resolved

ordabayevy added awaiting response and removed awaiting review labels Apr 25, 2024

ordabayevy added 3 commits May 2, 2024 16:15

merge main branch

8c309df

fix mypy errors

59d60f4

refactor

051433a

ordabayevy force-pushed the cellarium-gpt branch from bd7cefb to 051433a Compare May 7, 2024 01:40

ordabayevy added 2 commits May 7, 2024 14:43

val_dataloader

e388fe6

Merge branch 'val-dataloader' into cellarium-gpt

b1787d8

ordabayevy changed the base branch from main to val-dataloader May 7, 2024 17:41

ordabayevy added 2 commits May 7, 2024 19:04

validate pipeline

3bdd660

Merge branch 'main' into val-dataloader

b4d4202

ordabayevy added 3 commits May 7, 2024 19:07

Merge branch 'main' into cellarium-gpt

211600b

Merge branch 'val-dataloader' into cellarium-gpt

9007689

validate

d02a7ef

ordabayevy force-pushed the val-dataloader branch from b4d4202 to e8b99b3 Compare May 8, 2024 16:52

Base automatically changed from val-dataloader to main May 8, 2024 17:24

ordabayevy changed the base branch from main to timer May 8, 2024 22:45

ordabayevy changed the base branch from timer to main May 8, 2024 22:45

ordabayevy added 5 commits May 8, 2024 22:48

Merge branch 'main' into cellarium-gpt

9f5f944

label_mask_nc

8539efa

suffix_len

bdc734d

update config file

a397450

merge main

eeb506c

ordabayevy force-pushed the cellarium-gpt branch from 24f4675 to eeb506c Compare May 22, 2024 14:00

merge main

bdfce16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CellariumGPT #129

CellariumGPT #129

ordabayevy commented Mar 7, 2024

mbabadi Mar 7, 2024

mbabadi Mar 7, 2024

mbabadi Mar 13, 2024 •

edited

Loading

mbabadi Mar 13, 2024

mbabadi Mar 13, 2024

mbabadi Mar 13, 2024

mbabadi left a comment

sjfleming Mar 24, 2024

sjfleming Mar 24, 2024

sjfleming Mar 24, 2024

ordabayevy Mar 24, 2024

CellariumGPT #129

Are you sure you want to change the base?

CellariumGPT #129

Conversation

ordabayevy commented Mar 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbabadi Mar 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbabadi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbabadi Mar 13, 2024 •

edited

Loading