Loss not going down #96

vijayg78 · 2017-07-05T11:56:45Z

Hi,
I started a training from scratch with train.py with VOC2012 data set. I downloaded the Augmented GTs and plugged in to the data set. Now the GTs are the augmented GTs and original jpg files from data set.
The loss is not going down, it is oscillating. Any clue on how to get it working?
Regards, Vijay

DrSleep · 2017-07-06T10:50:07Z

from scratch

Do you mean with randomly initialised model?

vijayg78 · 2017-07-06T12:25:05Z

i used the deeplab_resnet_init.ckpt and tried to run the train.py file. The loss was oscillating and not coming down at all. I also tried the deeplab_resnet.ckpt same behaviour.

vijayg78 · 2017-07-06T14:32:30Z

I used the JPEGimages from VOCdevkit and GTs were pointed to Augmented images i downloaded from this github. Thats correct right?

akshittyagi · 2017-07-06T17:09:52Z

Same problem for a model which doesn't use the deeplab_resnet.ckpt file to init

DrSleep · 2017-07-07T07:45:30Z

what are the images in your tensorboard after few iterations?

Hjy20255 · 2017-07-10T14:29:07Z

i have same problem ，i use my own datatset(3 classes ) to train.Loss value was oscillating and not coming down at all. LOSS 1.2~1.3

akshittyagi · 2017-07-10T15:33:22Z

@DrSleep there are no images being produced in tensorboard

DrSleep · 2017-07-12T07:13:43Z

2all: the hyperparameters (learning rate, batch size, momentum, etc.) have been chosen on Pascal VOC (for the procedure behind these choices, please refer to the original paper).
It is not the case that the same hyperparameters would be suitable for other datasets, thus it is your task finding an appropriate set of hyperparameters for your dataset.

This repository is a replication of an academic paper. Anything else besides that is a bonus (like an ability to train on your own datasets).

akshittyagi · 2017-07-12T07:15:45Z

Okay. But the model is also not working for VOC dataset when not using the pretrained .ckpt file

DrSleep · 2017-07-12T07:50:20Z

It works (proof, proof) on VOC with either pre-trained or not pre-trained files.
Make sure that your setup is correct.

wangruixing · 2017-07-18T09:30:47Z

I also meet this problem, I use VOC2012, and pretrained model..

dongzhuoyao · 2017-08-13T07:52:29Z

same here

chenyuZha · 2017-09-22T09:48:23Z

In my case I used my own data set to do training. At first I took train.py then the loss went down very very very slowly (from 10 to 8 for 60000 steps), then I took another script train_msc.py and the loss began to go down very quickly , and I found that the second one did training better than the first since the loss was much smaller (about 3 instead of 8 in my case).

zhengyang-wang · 2017-10-02T18:41:09Z

May I know the final loss for after running train.py for 20K iterations with deeplab_resnet_init.ckpt as a start? I used PASCAL dataset and the final loss was about 1.3.
It would be better if you could provide the graph of your training curve?

ChuanWang90 · 2017-11-18T04:17:16Z

Same here. With the default configuration and PascalVOC the loss oscillates between 1.2-1.3. Could someone plot the training curve or tell which are the loss values after 20K iterations for example? Thanks!

FeiWard · 2017-12-27T09:45:01Z

Have someone show the loss after 20K? It is about 1.18 in my PC. Or who knows the reason?

EternityZY · 2018-04-29T05:03:19Z

my loss is always about 1.3 and the result predicted the images is black,nothing result.I use default hyperparameters and voc2012 dataset with deeplab_resnet.ckpt as a start.why doesn't work?

PallawiSinghal · 2019-12-10T18:56:06Z

my loss is always about 1.3 and the result predicted the images is black,nothing result.I use default hyperparameters and voc2012 dataset with deeplab_resnet.ckpt as a start.why doesn't work?

Hi were you able to solve the issue.

PallawiSinghal · 2019-12-11T12:44:14Z

Hi, My loss does not change. It has become stagnant. I have tried everything mentioned related to deeplabv3+ on every blog.
I am training to detect roads. My images are of 2000x2000.
My training data has 45k images.
I have created my image in the form of PASCAL VOC. I have three kinds of pixels.
background = [0,0,0]
Void class = [255,255,255]
road = [1,1,1]
so the number of classes = 3
I am using PASCAL VOC pre trained weights.

changes in train_util.py are :
1.
ignore_weight = 0
label0_weight =10
label1_weight = 15
not_ignore_mask =
tf.to_float(tf.equal(scaled_labels, 1)) * label0_weight

tf.to_float(tf.equal(scaled_labels, 2)) * label1_weight
tf.to_float(tf.equal(scaled_labels, ignore_label)) * ignore_weight

Variables that will not be restored.

exclude_list = ['global_step','logits']
if not initialize_last_layer:
exclude_list.extend(last_layers)

my train.py

nohup python deeplab/train.py
--logtostderr
--training_number_of_steps=65000
--train_split="train"
--model_variant="xception_65"
--atrous_rates=6
--atrous_rates=12
--atrous_rates=18
--output_stride=16
--decoder_output_stride=4
--train_batch_size=2
--initialize_last_layer=False
--last_layers_contain_logits_only=True
--dataset="pascal_voc_seg"
--tf_initial_checkpoint="/data/old_model/models/research/deeplabv3_pascal_trainval/model.ckpt"
--train_logdir="/data/old_model/models/research/deeplab/mycheckpoints"
--dataset_dir="/data/models/research/deeplab/datasets/tfrecord" > my_output.log &

Please help 👍
INFO:tensorflow:global step 700: loss = 0.1759 (0.449 sec/step)
INFO:tensorflow:global step 710: loss = 0.1695 (0.655 sec/step)
INFO:tensorflow:global step 720: loss = 0.1742 (0.689 sec/step)
INFO:tensorflow:global step 730: loss = 0.1710 (0.505 sec/step)
INFO:tensorflow:global step 740: loss = 0.1708 (0.868 sec/step)
INFO:tensorflow:global step 750: loss = 0.1683 (0.632 sec/step)
INFO:tensorflow:global step 760: loss = 0.1692 (0.442 sec/step)
INFO:tensorflow:global step 770: loss = 0.1693 (0.597 sec/step)
INFO:tensorflow:global step 780: loss = 0.1665 (0.441 sec/step)
INFO:tensorflow:global step 790: loss = 0.1680 (0.548 sec/step)
INFO:tensorflow:global step 800: loss = 0.1708 (0.372 sec/step)
INFO:tensorflow:global step 810: loss = 0.1674 (0.327 sec/step)
INFO:tensorflow:global step 820: loss = 0.1666 (0.951 sec/step)
INFO:tensorflow:global step 830: loss = 0.1651 (0.557 sec/step)
INFO:tensorflow:global step 840: loss = 0.1663 (0.506 sec/step)
INFO:tensorflow:global step 850: loss = 0.1646 (0.446 sec/step)
INFO:tensorflow:global step 860: loss = 0.1666 (0.424 sec/step)
INFO:tensorflow:global step 870: loss = 0.1654 (0.520 sec/step)
INFO:tensorflow:global step 880: loss = 0.1662 (0.675 sec/step)
INFO:tensorflow:global step 890: loss = 0.1673 (0.325 sec/step)
INFO:tensorflow:global step 900: loss = 0.1633 (0.548 sec/step)
INFO:tensorflow:global step 910: loss = 0.1659 (0.374 sec/step)
INFO:tensorflow:global step 920: loss = 0.1639 (0.663 sec/step)
INFO:tensorflow:global step 930: loss = 0.1658 (0.442 sec/step)
INFO:tensorflow:global step 940: loss = 0.1654 (0.568 sec/step)

subbulakshmisubha · 2020-05-21T14:22:55Z

@PallawiSinghal Did u find a solution to your problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loss not going down #96

Loss not going down #96

vijayg78 commented Jul 5, 2017

DrSleep commented Jul 6, 2017

vijayg78 commented Jul 6, 2017

vijayg78 commented Jul 6, 2017

akshittyagi commented Jul 6, 2017

DrSleep commented Jul 7, 2017

Hjy20255 commented Jul 10, 2017

akshittyagi commented Jul 10, 2017

DrSleep commented Jul 12, 2017

akshittyagi commented Jul 12, 2017

DrSleep commented Jul 12, 2017 •

edited

Loading

wangruixing commented Jul 18, 2017

dongzhuoyao commented Aug 13, 2017

chenyuZha commented Sep 22, 2017

zhengyang-wang commented Oct 2, 2017

ChuanWang90 commented Nov 18, 2017

FeiWard commented Dec 27, 2017

EternityZY commented Apr 29, 2018

PallawiSinghal commented Dec 10, 2019

PallawiSinghal commented Dec 11, 2019

subbulakshmisubha commented May 21, 2020

Loss not going down #96

Loss not going down #96

Comments

vijayg78 commented Jul 5, 2017

DrSleep commented Jul 6, 2017

vijayg78 commented Jul 6, 2017

vijayg78 commented Jul 6, 2017

akshittyagi commented Jul 6, 2017

DrSleep commented Jul 7, 2017

Hjy20255 commented Jul 10, 2017

akshittyagi commented Jul 10, 2017

DrSleep commented Jul 12, 2017

akshittyagi commented Jul 12, 2017

DrSleep commented Jul 12, 2017 • edited Loading

wangruixing commented Jul 18, 2017

dongzhuoyao commented Aug 13, 2017

chenyuZha commented Sep 22, 2017

zhengyang-wang commented Oct 2, 2017

ChuanWang90 commented Nov 18, 2017

FeiWard commented Dec 27, 2017

EternityZY commented Apr 29, 2018

PallawiSinghal commented Dec 10, 2019

PallawiSinghal commented Dec 11, 2019

Variables that will not be restored.

subbulakshmisubha commented May 21, 2020

DrSleep commented Jul 12, 2017 •

edited

Loading