Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjustments based on benchmark run #39

Merged
merged 8 commits into from
Jan 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion common
26 changes: 15 additions & 11 deletions src/methods/geneformer/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,24 @@ info:
method_types: [embedding]
variants:
geneformer_12L_95M_i4096:
model: "gf-12L-95M-i4096"
model: gf-12L-95M-i4096
geneformer_6L_30M_i2048:
model: "gf-6L-30M-i2048"
model: gf-6L-30M-i2048
geneformer_12L_30M_i2048:
model: "gf-12L-30M-i2048"
model: gf-12L-30M-i2048
geneformer_20L_95M_i4096:
model: "gf-20L-95M-i4096"
model: gf-20L-95M-i4096

arguments:
- name: "--model"
type: "string"
- name: --model
type: string
description: String representing the Geneformer model to use
choices: ["gf-6L-30M-i2048", "gf-12L-30M-i2048", "gf-12L-95M-i4096", "gf-20L-95M-i4096"]
default: "gf-12L-95M-i4096"
choices:
- gf-6L-30M-i2048
- gf-12L-30M-i2048
- gf-12L-95M-i4096
- gf-20L-95M-i4096
default: gf-12L-95M-i4096

resources:
- type: python_script
Expand All @@ -48,9 +52,9 @@ engines:
setup:
- type: python
pip:
- pyarrow<15.0.0a0,>=14.0.1
- huggingface_hub
- git+https://huggingface.co/ctheodoris/Geneformer.git
- pyarrow<15.0.0a0,>=14.0.1
- huggingface_hub
- git+https://huggingface.co/ctheodoris/Geneformer.git

runners:
- type: executable
Expand Down
1 change: 1 addition & 0 deletions src/methods/scgpt_finetuned/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ resources:
path: script.py
- path: /src/utils/read_anndata_partial.py
- path: scgpt_functions.py
- path: /src/utils/exit_codes.py

engines:
- type: docker
Expand Down
3 changes: 2 additions & 1 deletion src/methods/scgpt_finetuned/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@

sys.path.append(meta["resources_dir"])
from read_anndata_partial import read_anndata
from exit_codes import exit_non_applicable
from scgpt_functions import evaluate, prepare_data, prepare_dataloader, train

print(f"====== scGPT version {scgpt.__version__} ======", flush=True)
Expand All @@ -39,7 +40,7 @@
adata = read_anndata(par["input"], X="layers/counts", obs="obs", var="var", uns="uns")

if adata.uns["dataset_organism"] != "homo_sapiens":
raise ValueError(
exit_non_applicable(
f"scGPT can only be used with human data "
f"(dataset_organism == \"{adata.uns['dataset_organism']}\")"
)
Expand Down
17 changes: 15 additions & 2 deletions src/methods/scprint/config.vsh.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__merge__: /src/api/base_method.yaml
__merge__: /src/api/comp_method.yaml

name: scprint
label: scPRINT
Expand Down Expand Up @@ -38,6 +38,11 @@ info:
model_name: "medium"
scprint_small:
model_name: "small"
test_setup:
run:
model_name: small
batch_size: 16
max_len: 100

arguments:
- name: "--model_name"
Expand All @@ -49,6 +54,14 @@ arguments:
type: file
description: Path to the scPRINT model.
required: false
- name: --batch_size
type: integer
description: The size of the batches to be used in the DataLoader.
default: 64
- name: --max_len
type: integer
description: The maximum length of the gene sequence.
default: 4000

resources:
- type: python_script
Expand Down Expand Up @@ -79,4 +92,4 @@ runners:
- type: executable
- type: nextflow
directives:
label: [hightime, midmem, midcpu, gpu]
label: [hightime, midmem, midcpu, gpu, midsharedmem]
20 changes: 11 additions & 9 deletions src/methods/scprint/script.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
import anndata as ad
from scdataloader import Preprocessor
import os
import sys
from huggingface_hub import hf_hub_download
from scprint.tasks import Embedder
from scprint import scPrint

import anndata as ad
import scprint
import torch
import os
from huggingface_hub import hf_hub_download
from scdataloader import Preprocessor
from scprint import scPrint
from scprint.tasks import Embedder

## VIASH START
par = {
Expand All @@ -19,8 +20,8 @@
## VIASH END

sys.path.append(meta["resources_dir"])
from read_anndata_partial import read_anndata
from exit_codes import exit_non_applicable
from read_anndata_partial import read_anndata

print(f"====== scPRINT version {scprint.__version__} ======", flush=True)

Expand All @@ -41,7 +42,7 @@

print("\n>>> Preprocessing data...", flush=True)
preprocessor = Preprocessor(
min_valid_genes_id=min(0.9 * adata.n_vars, 10000), # 90% of features up to 10,000
min_valid_genes_id=min(0.9 * adata.n_vars, 10000), # 90% of features up to 10,000
# Turn off cell filtering to return results for all cells
filter_cell_by_counts=False,
min_nnz_genes=False,
Expand Down Expand Up @@ -77,7 +78,8 @@
print(f"Using {n_cores_available} worker cores")
embedder = Embedder(
how="random expr",
max_len=4000,
batch_size=par["batch_size"],
max_len=par["max_len"],
add_zero_genes=0,
num_workers=n_cores_available,
doclass=False,
Expand Down
2 changes: 1 addition & 1 deletion src/metrics/asw_label/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,4 @@ runners:
- type: executable
- type: nextflow
directives:
label: [midtime, midmem, lowcpu]
label: [hightime, midmem, lowcpu]
2 changes: 1 addition & 1 deletion src/metrics/isolated_label_asw/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,4 +39,4 @@ runners:
- type: executable
- type: nextflow
directives:
label: [midtime, midmem, lowcpu]
label: [hightime, midmem, lowcpu]
2 changes: 1 addition & 1 deletion src/metrics/kbet/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,4 @@ runners:
- type: executable
- type: nextflow
directives:
label: [hightime, highmem, lowcpu]
label: [hightime, veryhighmem, lowcpu]
2 changes: 1 addition & 1 deletion src/metrics/kbet/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
type_="embed",
embed="X_emb",
scaled=True,
verbose=False,
verbose=True,
)
print(score, flush=True)

Expand Down
Loading