We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我看PPO这里加载的agent是train on policy的,但是直接train的话并不会有经验池,但PPO中N步更新的时候不是应该有一个经验池吗,就是对应的off policy部分,这里是在哪体现出来的呢?
The text was updated successfully, but these errors were encountered:
off policy 经验池会存所有交互过程的经验 on policy 只会用N步经验来更新 下一个epoch就清空了
Sorry, something went wrong.
No branches or pull requests
我看PPO这里加载的agent是train on policy的,但是直接train的话并不会有经验池,但PPO中N步更新的时候不是应该有一个经验池吗,就是对应的off policy部分,这里是在哪体现出来的呢?
The text was updated successfully, but these errors were encountered: