-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't resume training my dataset #997
Comments
👋 Hello @bluesky93128, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:
As this appears to be a 🐛 Bug Report, could you please provide additional details, such as the specific steps to reproduce the issue? This includes:
This information will help us diagnose the issue faster 🔍! An Ultralytics engineer will review and assist with this shortly. Thank you for your patience and collaboration! 😊 |
Please refer this ticket. Some GPUs were working, but now all gives error. |
@bluesky93128 thank you for bringing this to our attention and referencing the related issue. If you are experiencing GPU-related errors across all options while trying to resume training, here are a few steps to help troubleshoot and resolve the issue:
If the above steps do not resolve the issue, we recommend testing on different hardware or environments to rule out compatibility issues. Additionally, feel free to share more details about your setup (e.g., dataset, model, and environment specifics). We’ll do our best to assist further. 😊 Let us know how it goes! |
@pderrenger I've tried all of these, but nothing works. |
Hi @bluesky93128, Thank you for the update and clarification. I’ve reported this issue to the development team for further investigation. They are looking into it, and I’ll keep you updated as soon as we have a resolution. We appreciate your patience! |
Hi @yogendrasinghx |
Hi @bluesky93128, We’ve investigated the issue, and the development team is actively working on a fix related to low GPU availability. Most likely, the GPU you selected wasn’t available at the time of your request, but it may become available a few minutes later. We also noticed that the previous error message wasn’t clear, so we’ve released a new version that provides a more informative message when the selected GPU is unavailable. Please try again and let us know if you continue experiencing issues. Thanks for your patience! |
Related issue: #998 |
I'm still not able to resume my training |
Thank you for reaching out. To help us investigate this issue further, could you please share the Model ID? You can find it in the URL when you access your model on the platform. Providing this information will allow us to locate your account and identify the issue. Looking forward to your response so we can resolve this for you. |
xt4bDTXOQFHs3t67z16w |
@bluesky93128 We’ve reverted the resume checkpoint to epoch 53 (the last successfully uploaded checkpoint), which should allow you to resume training immediately. Please try resuming the training again. |
Search before asking
HUB Component
Training, Datasets
Bug
All GPU options are not working for now.
I've tried every options in the list, but still can't resume my training.
Environment
No response
Minimal Reproducible Example
No response
Additional
No response
The text was updated successfully, but these errors were encountered: