-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi GPU Training Fails - RuntimeErrordist._broadcast_coalesced(: Invalid scalar type - RuntimeError: Invalid scalar type #1047
Comments
This error seems to be same as this issue: NVIDIA/NeMo#5485 Please verify GPU version of PyTorch is installed. |
thank you. single GPU training working. I think it is accurately installed here the venv
|
The user seems to modify the script to use |
thanks looks like linux is mandatory atm |
I have a subscriber who has dual RTX 4060 Ti - 16 GB
He is on Windows 10 and Python 3.10.9 - fresh install
When we set the huggingface default_config.yaml like below
train util.py like below
We are getting the below error. How can we fix it?
The text was updated successfully, but these errors were encountered: