GPU memory usage during training #8

wren93 · 2023-08-25T05:14:14Z

Hi, thank you for sharing this great work. I'm trying to train LVDM on ucf101 for unconditional generation and I observed a weird gpu memory usage during training. When I used batch size=2 I got from nvidia-smi that my gpu memory usage during training is roughly 73000mb. However, when I increased the batch size to 32 the memory usage went down to ~35000mb during training. I tried to debug the code and it seems that the UNet model is consuming a large amount of memory when the batch size is small (memory increases from 8g to ~73g before and after line 626 - line 634 in lvdm/models/modules/openaimodel3d.py). I wonder if you have any insights on this issue. Thanks!

wren93 · 2023-08-25T05:36:56Z

Also I'm wondering what are the usages of the batch size and num_workers parameters under the "trainer" attribute in the config file (as shown in the figure). It seems that the actual parameters to control the dataloader are under "data"?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory usage during training #8

GPU memory usage during training #8

wren93 commented Aug 25, 2023

wren93 commented Aug 25, 2023

GPU memory usage during training #8

GPU memory usage during training #8

Comments

wren93 commented Aug 25, 2023

wren93 commented Aug 25, 2023