Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indefinite Epoch Loop - Model Stuck in Training Phase #4

Open
jennbparker opened this issue Jan 9, 2025 · 3 comments
Open

Indefinite Epoch Loop - Model Stuck in Training Phase #4

jennbparker opened this issue Jan 9, 2025 · 3 comments

Comments

@jennbparker
Copy link

Hi Braingeneers team,

I've tried setting up a training model for a single nuclei RNA-seq dataset, and using the default settings sims.train(), seem to be stuck in the training phase (running on epoch 390 and counting).

image

Thanks so much for your help with this query!

Please see details below:

Code used to train the model:

import anndata as an
import scanpy as sc
import numpy as np
adata = an.read_h5ad('/oak/stanford/groups/longaker/GriffPark/Adipo/Lineage/forSIMSRef.h5ad')

thing = adata.X
thing = thing.astype(np.float32)
thing
<6237x24149 sparse matrix of type '<class 'numpy.float32'>'
with 11718092 stored elements in Compressed Sparse Column format>
adata.X = thing
adata.X
<6237x24149 sparse matrix of type '<class 'numpy.float32'>'
with 11718092 stored elements in Compressed Sparse Column format>
data = adata
data.layers["logcounts"] = data.X

sims = SIMS(data=data, class_label='label')
Numerically encoding class labels
Calculating weights
sims.setup_trainer(accelerator="cpu", devices=1, logger=logger)
Setting up trainer ...
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
sims.train(numworkers = 48)

Please note that I did not have a GPU available, so set accelerator to "cpu" instead.

SIMS was installed via the following line of code: pip install --use-pep517 git+https://github.com/braingeneers/SIMS.git

Environment manager: conda
Python version: 3.9.17

Package information: Package Version


aiohttp 3.8.4
aiosignal 1.3.1
anndata 0.9.1
anyio 3.6.2
appdirs 1.4.4
arrow 1.2.3
async-timeout 4.0.2
attrs 23.1.0
beautifulsoup4 4.12.2
blessed 1.20.0
boto3 1.26.130
botocore 1.29.130
certifi 2023.5.7
charset-normalizer 3.1.0
click 8.1.3
contourpy 1.0.7
croniter 1.3.14
cycler 0.11.0
dateutils 0.6.12
deepdiff 6.3.0
docker-pycreds 0.4.0
fastapi 0.88.0
fonttools 4.39.3
fortran-language-server 1.12.0
frozenlist 1.3.3
fsspec 2023.5.0
gitdb 4.0.10
GitPython 3.1.31
h11 0.14.0
h5py 3.8.0
idna 3.4
importlib-resources 5.12.0
inquirer 3.1.3
itsdangerous 2.1.2
Jinja2 3.1.2
jmespath 1.0.1
joblib 1.2.0
kiwisolver 1.4.4
lightning 2.0.2
lightning-cloud 0.5.34
lightning-utilities 0.8.0
llvmlite 0.40.0
markdown-it-py 2.2.0
MarkupSafe 2.1.2
matplotlib 3.7.1
mdurl 0.1.2
multidict 6.0.4
natsort 8.3.1
networkx 3.1
numba 0.57.0
numpy 1.24.3
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
ordered-set 4.1.0
packaging 23.1
pandas 2.0.1
pathtools 0.1.2
patsy 0.5.3
Pillow 9.5.0
pip 24.2
protobuf 4.23.0
psutil 5.9.5
pydantic 1.10.7
Pygments 2.15.1
PyJWT 2.6.0
pynndescent 0.5.10
pyparsing 3.0.9
python-dateutil 2.8.2
python-editor 1.0.4
python-multipart 0.0.6
pytorch-lightning 2.0.2
pytorch-tabnet 4.0
pytz 2023.3
PyYAML 6.0
readchar 4.0.5
requests 2.30.0
rich 13.3.5
s3transfer 0.6.1
scanpy 1.9.3
scikit-learn 1.2.2
scipy 1.10.1
scsims 3.0.6
seaborn 0.12.2
sentry-sdk 1.22.2
session_info 1.0.0
setproctitle 1.3.2
setuptools 75.1.0
six 1.16.0
smmap 5.0.0
sniffio 1.3.0
soupsieve 2.4.1
starlette 0.22.0
starsessions 1.3.0
statsmodels 0.14.0
stdlib-list 0.8.0
threadpoolctl 3.1.0
torch 1.13.1
torchmetrics 0.11.4
tqdm 4.65.0
traitlets 5.9.0
typing_extensions 4.5.0
tzdata 2023.3
umap-learn 0.5.3
urllib3 1.26.15
uvicorn 0.22.0
wandb 0.15.2
wcwidth 0.2.6
websocket-client 1.5.1
websockets 11.0.3
wheel 0.44.0
yarl 1.9.2
zipp 3.15.0

@jlehrer1
Copy link
Collaborator

jlehrer1 commented Jan 9, 2025

Hi Jennifer! This is a weird edge case of one of the packages we're using (lightning) that I think sets the number of epochs to infinity by default. Can you please update the line

sims.setup_trainer(accelerator="cpu", devices=1, logger=logger)
to
sims.setup_trainer(accelerator="cpu", devices=1, logger=logger, max_epochs=1)

and confirm it trains for only one epoch?

I'll also change the default so this doesn't happen in the future.

@jlehrer1
Copy link
Collaborator

jlehrer1 commented Jan 9, 2025

Yeah I confirmed (https://lightning.ai/docs/pytorch/stable/common/trainer.html#max-epochs) the default is 1000, so that's the issue.

@jennbparker
Copy link
Author

Confirmed - running with max_epochs = 1 did complete correctly! Thank you very much :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants