Questions about infinit bootstrap #22

geekyutao · 2021-09-18T05:15:32Z

Hi, thank you for your code. I'm a little bit confused of the infinit bootstrap in

Line 269 in 8416d6e

done_bool = 0 if episode_step + 1 == env._max_episode_steps else float(

.
Will it be wrong when sampling at the end of an episode (where the next_obs is the start observation of the next episode)? It seems you simply ignore this.

yueyang130 · 2022-04-26T14:15:55Z

It seems in DMcontrol there is no true terminal state. So it allows infinte bootstrap.

yueyang130 · 2022-08-06T03:09:31Z

For @geekyutao 's question, the point is that the next_ob will never be the start observation of the next episode. Because at the previous timestep, the next_ob is the terminal state and done is true (Note done_bool is alway false whereas done is true at the max step). Then env is reset and the ob is set to the start observation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about infinit bootstrap #22

Questions about infinit bootstrap #22

geekyutao commented Sep 18, 2021

yueyang130 commented Apr 26, 2022

yueyang130 commented Aug 6, 2022 •

edited

Loading

Questions about infinit bootstrap #22

Questions about infinit bootstrap #22

Comments

geekyutao commented Sep 18, 2021

yueyang130 commented Apr 26, 2022

yueyang130 commented Aug 6, 2022 • edited Loading

yueyang130 commented Aug 6, 2022 •

edited

Loading