Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly #11

Open
FLAWLESSJade opened this issue May 23, 2021 · 2 comments

Comments

@FLAWLESSJade
Copy link

Hi,thx for your code, it is a good job,but i have meet some problems when i begin to train it.Hope you can give some advices to me
thanks ~
And, there are my specific problems as follows:
#my training environment is python:3.7.9 with pytorch 1.7.1 and cudnn 11.0.
I learn your training environment is python2 with pytorch 0.4.0,so i revised some code like format of print to python3,
and to solve the alignment problem of shape of mnist and mnist-m ,i revised code of transfrom part to align it.

After those little revises , the code get work. But after one epoch, and raising this prolems as follows:

File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 885, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 17440, 2128, 16900, 13076, 17672, 15916) exited unexpectedly

1621784346(1)
1621784428(1)

To solve this problem, i set the numwork =0 or 1,and revise the value of batch size from 64 to 32,16,8, even 1,but it still raise this problem,after first one epoch. i don't know how to solve it right now,may you give me some helps? thx !!

@FLAWLESSJade
Copy link
Author

I have solve this problem in my other server with environment : pytorch 1.6.0 , torchvision 0.7.0 and ubuntu 16.04 OS

The method to deal this problem as follows:
When i exchange my training environment to this server, i found the same situation like someones asked before that the shape of mnist and mnist-m are not align,but this code was work in my prior server.
So, i deduce that this problem may raised by the code of dataloader part, i revised this code as follows,which help me solved this problem above.

transform = transforms.Compose([
transforms.ToTensor(),
transforms.Lambda(lambda x: x.repeat(3,1,1)), #Add this code to reshape channel from 1 to 3
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])

And,a new problem is raised that the loss of target dif and source dif are always keep 0,it is abnormal.
It looks like the model do not have any back propagation.
2021-05-24 13-46-08屏幕截图

@elmaghba
Copy link

@FLAWLESSJade I got the same issue, did you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants