You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2023-08-22 17:46:14 | INFO | mmdet.core.trainer:493 - ---> start train epoch1
2023-08-22 17:46:16 | ERROR | mmdet.core.trainer:98 - one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor []] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
2023-08-22 17:46:16 | INFO | mmdet.core.trainer:343 - Training of experiment is done and the best AP is 0.00
2023-08-22 17:46:16 | ERROR | mmdet.core.launch:147 - An error has been caught in function '_distributed_worker', process 'SpawnProcess-1' (478), thread 'MainThread' (139673154561728):
Traceback (most recent call last):
File "", line 1, in
File "/data/anaconda3/envs/miemie_det/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
│ │ └ 5
│ └ 8
└ <function _main at 0x7f083058cc10>
File "/data/anaconda3/envs/miemie_det/lib/python3.8/multiprocessing/spawn.py", line 129, in _main
return self._bootstrap(parent_sentinel)
│ │ └ 5
│ └ <function BaseProcess._bootstrap at 0x7f083073dee0>
└
File "/data/anaconda3/envs/miemie_det/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
│ └ <function BaseProcess.run at 0x7f083073d550>
└
File "/data/anaconda3/envs/miemie_det/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
│ │ │ │ │ └ {}
│ │ │ │ └
│ │ │ └ (<function _distributed_worker at 0x7f07b1e38160>, 0, (<function main at 0x7f077b30d940>, 2, 2, 0, 'nccl', 'tcp://127.0.0.1:5...
│ │ └
│ └ <function _wrap at 0x7f07b1956310>
└
File "/data/anaconda3/envs/miemie_det/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
│ │ └ (<function main at 0x7f077b30d940>, 2, 2, 0, 'nccl', 'tcp://127.0.0.1:56017', (╒═══════════════════════╤═════════════════════...
│ └ 0
└ <function _distributed_worker at 0x7f07b1e38160>
File "/home/a-bamboo/repositories/miemiedetection/mmdet/core/launch.py", line 147, in _distributed_worker
main_func(*args)
│ └ (╒═══════════════════════╤═══════════════════════════════════════════════════════════════════════════════════════════════════...
└ <function main at 0x7f077b30d940>
File "/home/a-bamboo/repositories/miemiedetection/tools/train.py", line 126, in main
trainer.train()
│ └ <function Trainer.train at 0x7f077a68bc10>
└ <mmdet.core.trainer.Trainer object at 0x7f077a65ce20>
File "/home/a-bamboo/repositories/miemiedetection/mmdet/core/trainer.py", line 96, in train
self.train_in_epoch()
│ └ <function Trainer.train_in_epoch at 0x7f077a68bd30>
└ <mmdet.core.trainer.Trainer object at 0x7f077a65ce20>
File "/home/a-bamboo/repositories/miemiedetection/mmdet/core/trainer.py", line 336, in train_in_epoch
self.train_in_iter()
│ └ <function Trainer.train_in_iter at 0x7f077a68be50>
└ <mmdet.core.trainer.Trainer object at 0x7f077a65ce20>
File "/home/a-bamboo/repositories/miemiedetection/mmdet/core/trainer.py", line 350, in train_in_iter
self.train_one_iter()
│ └ <function Trainer.train_one_iter at 0x7f077a68bee0>
└ <mmdet.core.trainer.Trainer object at 0x7f077a65ce20>
File "/home/a-bamboo/repositories/miemiedetection/mmdet/core/trainer.py", line 462, in train_one_iter
self.scaler.scale(loss).backward()
│ │ │ └ tensor(13467.0713, device='cuda:0', grad_fn=)
│ │ └ <function GradScaler.scale at 0x7f07b2133790>
│ └ <torch.cuda.amp.grad_scaler.GradScaler object at 0x7f077a65ce50>
└ <mmdet.core.trainer.Trainer object at 0x7f077a65ce20>
File "/data/anaconda3/envs/miemie_det/lib/python3.8/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
│ │ │ │ │ │ │ └ None
│ │ │ │ │ │ └ False
│ │ │ │ │ └ None
│ │ │ │ └ None
│ │ │ └ tensor(13467.0713, device='cuda:0', grad_fn=)
│ │ └ <function backward at 0x7f07b1d6cee0>
│ └ <module 'torch.autograd' from '/data/anaconda3/envs/miemie_det/lib/python3.8/site-packages/torch/autograd/init.py'>
└ <module 'torch' from '/data/anaconda3/envs/miemie_det/lib/python3.8/site-packages/torch/init.py'>
File "/data/anaconda3/envs/miemie_det/lib/python3.8/site-packages/torch/autograd/init.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
│ │ └ <method 'run_backward' of 'torch._C._EngineBase' objects>
│ └ <torch._C._EngineBase object at 0x7f07be7d8d80>
└ <class 'torch.autograd.variable.Variable'>
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor []] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
The text was updated successfully, but these errors were encountered:
python tools/train.py -f exps/rtdetr/rtdetr_r18vd_6x_coco.py -d 2 -b 20 -eb 24 -w 4 -ew 4 -lrs 0.1
报错如下:
The text was updated successfully, but these errors were encountered: