You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
.
Will it be wrong when sampling at the end of an episode (where the next_obs is the start observation of the next episode)? It seems you simply ignore this.
The text was updated successfully, but these errors were encountered:
For @geekyutao 's question, the point is that the next_ob will never be the start observation of the next episode. Because at the previous timestep, the next_ob is the terminal state and done is true (Note done_bool is alway false whereas done is true at the max step). Then env is reset and the ob is set to the start observation.
Hi, thank you for your code. I'm a little bit confused of the infinit bootstrap in
curl/train.py
Line 269 in 8416d6e
Will it be wrong when sampling at the end of an episode (where the next_obs is the start observation of the next episode)? It seems you simply ignore this.
The text was updated successfully, but these errors were encountered: