-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: EfficientAd - CUDA out of memory. #2531
Comments
I am running into the exact same issue while using Fastflow. The jumps in allocated memory happen at the end of every validation epoch. I think I was able to track it down to the |
I think I found it: Adjusting the compute call like that solved it for me.
|
I just saw that this is a class from torchmetrics and not anomalib but i'll create an issue there and link it to this one here. |
Thanks for sharing @suahelen |
Describe the bug
Training an EfficientAd(small) model with all other parameters at default produces CUDA out of memory issue.
Dataset
N/A
Model
N/A
Steps to reproduce the behavior
41 Good Training Images
30 Good Validation Images and 170 Bad Validation Images
EfficientAd(small) model with all other parameters at default
At around epoch 30 of 300 (no callbacks) at 8 minutes or so using an RTX A5000 (24GB) I hit this issue. It feels like the hardware should be sufficient. I also watch the memory usage via nvidia-smi CLI and it swings back and forth but still climbs throughout training.
Adopting the suggested environment variable
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
also did not helpOS information
OS information:
Expected behavior
Training to complete
Screenshots
Pip/GitHub
pip
What version/branch did you use?
2.0.0b2
Configuration YAML
Logs
Code of Conduct
The text was updated successfully, but these errors were encountered: