You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am training a wikipedia corpus with 1B tokens, using sigmoid/gru with hidden count 1/2/3. The initial learning rate of 0.01 gave me pretty good results when I was working with 100M wikipedia, but for the 1B corpus after training a couple epochs both sigmoid/gru are starting to give me NaN entropy. Just curious, what are the learning rate that you used for the 1B benchmark corpus? I am now setting it to 0.001 and hopefully the gradients won't explode.
The text was updated successfully, but these errors were encountered:
Hi, I am training a wikipedia corpus with 1B tokens, using sigmoid/gru with hidden count 1/2/3. The initial learning rate of 0.01 gave me pretty good results when I was working with 100M wikipedia, but for the 1B corpus after training a couple epochs both sigmoid/gru are starting to give me NaN entropy. Just curious, what are the learning rate that you used for the 1B benchmark corpus? I am now setting it to 0.001 and hopefully the gradients won't explode.
The text was updated successfully, but these errors were encountered: