Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsVid2Vid--ZeroDivisionError: float division by zero #84

Open
Gopalsvs opened this issue Jun 13, 2021 · 0 comments
Open

fsVid2Vid--ZeroDivisionError: float division by zero #84

Gopalsvs opened this issue Jun 13, 2021 · 0 comments

Comments

@Gopalsvs
Copy link

Hello all,

We had an issue while training the fs-Vid2Vid model on a similar dataset compared to that of Youtube Dancing, we created all the 3 other folder poses-openpose,pose_maps-densepose, human_instance_maps for all the sequences and there are 3000 sequences. While training we got ZERO DIVISION ERROR after model completed 5 epoch. We confirmed the dataset do not contain any None images in images folder, pose_maps-densepose folder, human_instance_maps folder, we also confirmed no empty JSON files in poses-openpose. We kept batch size 2 and trained with a single GPU. We also decreased the dataset to 500 sequences and then tried to train, the same error occurred after the 7th epoch.
Is there a fix for this error?

This is the exact error we got:
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1e-323
Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 2.53e-321
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5e-324
Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 1.265e-321
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0
Gradient overflow. Skipping step, loss scaler 1 reducing loss scale to 6.3e-322
Traceback (most recent call last):
File "train.py", line 93, in
main()
File "train.py", line 78, in main
trainer.gen_update(data)
File "/mnt/fs/imaginaire/imaginaire/trainers/vid2vid.py", line 283, in gen_update
self.get_gen_losses(data_t, net_G_output, net_D_output)
File "/mnt/fs/imaginaire/imaginaire/trainers/vid2vid.py", line 537, in get_gen_losses
scaled_loss.backward()
File "/home/ubuntu/anaconda3/lib/python3.8/contextlib.py", line 120, in exit
next(self.gen)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/apex/amp/handle.py", line 123, in scale_loss
optimizer._post_amp_backward(loss_scaler)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights
post_backward_models_are_masters(scaler, params, stashed_grads)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/apex/amp/_process_optimizer.py", line 123, in post_backward_models_are_masters
scaler.unscale(
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/apex/amp/scaler.py", line 117, in unscale
1./scale)
ZeroDivisionError: float division by zero

Thanks for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant