We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REINFORCE代码实现那里,loss=-log_prob*G应该是损失函数的梯度才对,为什么代码里直接把它当成了损失函数然后用backward()求梯度再更新啊
loss=-log_prob*G
backward()
The text was updated successfully, but these errors were encountered:
这段话中说错了,应当将梯度符号去掉,这样就是损失函数,对其反向传播求梯度来更新参数即可。 但是代码的实现是对的。
Sorry, something went wrong.
No branches or pull requests
REINFORCE代码实现那里,
loss=-log_prob*G
应该是损失函数的梯度才对,为什么代码里直接把它当成了损失函数然后用backward()
求梯度再更新啊The text was updated successfully, but these errors were encountered: