We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
章节5.2的时序差分增量更新的推导: $V_\pi(s) = E[G_t|S_t=s]=...=E[R_t+\gamma V_\pi(S_{t+1} ) |S_t]$ 最后一行这里应该不是期望,直接就是 $R_t+\gamma V_\pi(S_{t+1} )$ 吧?
The text was updated successfully, but these errors were encountered:
因为从S到S_{t+1}有一个概率分布,所以应该是期望吧
Sorry, something went wrong.
R t + γ V π ( S t + 1 ) 是Gt 阿
No branches or pull requests
章节5.2的时序差分增量更新的推导:
$V_\pi(s) = E[G_t|S_t=s]=...=E[R_t+\gamma V_\pi(S_{t+1} ) |S_t]$ $R_t+\gamma V_\pi(S_{t+1} )$ 吧?
最后一行这里应该不是期望,直接就是
The text was updated successfully, but these errors were encountered: