Learning rate for 1B corpus #22

jhlau · 2016-04-29T16:04:18Z

Hi, I am training a wikipedia corpus with 1B tokens, using sigmoid/gru with hidden count 1/2/3. The initial learning rate of 0.01 gave me pretty good results when I was working with 100M wikipedia, but for the 1B corpus after training a couple epochs both sigmoid/gru are starting to give me NaN entropy. Just curious, what are the learning rate that you used for the 1B benchmark corpus? I am now setting it to 0.001 and hopefully the gradients won't explode.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning rate for 1B corpus #22

Learning rate for 1B corpus #22

jhlau commented Apr 29, 2016

Learning rate for 1B corpus #22

Learning rate for 1B corpus #22

Comments

jhlau commented Apr 29, 2016