raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly #11

FLAWLESSJade · 2021-05-23T15:41:14Z

Hi,thx for your code, it is a good job,but i have meet some problems when i begin to train it.Hope you can give some advices to me
thanks ~
And, there are my specific problems as follows:
#my training environment is python:3.7.9 with pytorch 1.7.1 and cudnn 11.0.
I learn your training environment is python2 with pytorch 0.4.0,so i revised some code like format of print to python3,
and to solve the alignment problem of shape of mnist and mnist-m ,i revised code of transfrom part to align it.

After those little revises , the code get work. But after one epoch, and raising this prolems as follows:

File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 885, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 17440, 2128, 16900, 13076, 17672, 15916) exited unexpectedly

To solve this problem, i set the numwork =0 or 1,and revise the value of batch size from 64 to 32,16,8, even 1,but it still raise this problem,after first one epoch. i don't know how to solve it right now,may you give me some helps? thx !!

FLAWLESSJade · 2021-05-24T05:47:10Z

I have solve this problem in my other server with environment : pytorch 1.6.0 , torchvision 0.7.0 and ubuntu 16.04 OS

The method to deal this problem as follows:
When i exchange my training environment to this server, i found the same situation like someones asked before that the shape of mnist and mnist-m are not align,but this code was work in my prior server.
So, i deduce that this problem may raised by the code of dataloader part, i revised this code as follows,which help me solved this problem above.

transform = transforms.Compose([
transforms.ToTensor(),
transforms.Lambda(lambda x: x.repeat(3,1,1)), #Add this code to reshape channel from 1 to 3
transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])

And,a new problem is raised that the loss of target dif and source dif are always keep 0,it is abnormal.
It looks like the model do not have any back propagation.

elmaghba · 2022-06-26T00:06:33Z

@FLAWLESSJade I got the same issue, did you solve it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly #11

raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly #11

FLAWLESSJade commented May 23, 2021

FLAWLESSJade commented May 24, 2021

elmaghba commented Jun 26, 2022

raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly #11

raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly #11

Comments

FLAWLESSJade commented May 23, 2021

FLAWLESSJade commented May 24, 2021

elmaghba commented Jun 26, 2022