Settings for Pong #138

George614 · 2024-06-05T00:52:37Z

Hi Danijar,

Thanks for sharing this amazing repo and creating a robust model-based RL algorithm! I've been playing with the replay buffer and trying to reproduce some of the results. I run the code on Pong with command python dreamerv3/main.py --logdir ./logdir/uniform_pong --configs atari --task atari_pong --run.train_ratio 32 with the default configurations on a Ubuntu 22.04 LTS with a RTX 3090 GPU. Somehow, the agent does not work on the Pong task over 400K env steps (according to the first version of the paper). I'm not sure what went wrong. I've tried with the default uniform replay (cyan curve in figure), a mixed replay (gray curve) with ratio of (0.5, 0.3, 0.2) and uniform replay with compute_dtype: float16 (magenta curve) since I've seen some warnings from CUDA and XLA.

Here are the package versions that I installed:

python 3.11.9
jax 0.4.28
jax-cuda12-pjrt 0.4.28
jax-cuda12-plugin 0.4.28
jaxlib 0.4.28+cuda12.cudnn89
ale-py 0.8.1
gymnasium 0.29.1
tensorflow-cpu 2.16.1
tensorflow-probability 0.24.0

Please let me know if anything was not set up properly. Thank you!

The text was updated successfully, but these errors were encountered:

IcarusWizard · 2024-06-11T15:15:15Z

As far as I know, the run.train_ratio should be 1024 for Atari100k.

NonsansWD · 2024-06-16T17:30:15Z

Hey,
First of all i think the first comment on this is right, you should increase the train_ratio. That was confusing for me too at first but that should solve the issue. Quick off topic question tho: I see you are running pretty recent versions of tensorflow-cpu as well as jax. Did u run into any issues where the pip installation stated that jax requires mldtype >= 4.0 and tensorflow requires that library to be version 3.2?

George614 · 2024-06-19T19:48:30Z

As far as I know, the run.train_ratio should be 1024 for Atari100k.

Thanks @IcarusWizard and @NonsansWD I'll try your suggestion!

George614 · 2024-06-19T19:54:09Z

Hey, First of all i think the first comment on this is right, you should increase the train_ratio. That was confusing for me too at first but that should solve the issue. Quick off topic question tho: I see you are running pretty recent versions of tensorflow-cpu as well as jax. Did u run into any issues where the pip installation stated that jax requires mldtype >= 4.0 and tensorflow requires that library to be version 3.2?

I have not run into that particular issue. I'd suggest that you install tensorflow-cpu first (maybe a less recent version) then install JAX.

NonsansWD · 2024-06-23T13:49:47Z

Hey, First of all i think the first comment on this is right, you should increase the train_ratio. That was confusing for me too at first but that should solve the issue. Quick off topic question tho: I see you are running pretty recent versions of tensorflow-cpu as well as jax. Did u run into any issues where the pip installation stated that jax requires mldtype >= 4.0 and tensorflow requires that library to be version 3.2?

I have not run into that particular issue. I'd suggest that you install tensorflow-cpu first (maybe a less recent version) then install JAX.

Alright good to know. In the end i was able to fix my issue and everything works fine. The only problem im left with is i just realized that the resulting folder called "replay" does not contain raw frames but instead a lot of data like rewards and so on. Do you by any chance know a way of obtaining a video of the agents steps or something so i can watch it do its stuff without too much effort? I feel like im missing something cause i also dont know where to get these wonderful score plots or do i have to construct that plot myself with matplotlib? sorry for going off topic

rsun0 · 2025-01-16T06:07:20Z

@IcarusWizard @NonsansWD How did you know that the train_ratio should be 1024 for atari100k, instead of 256 as set by configs.yaml for "atari100k"?

dreamerv3/dreamerv3/configs.yaml

Lines 211 to 216 in 251910d

    
           atari100k: 
        
             task: atari_pong 
        
             run: 
        
               steps: 1.1e5 
        
               num_envs: 1 
        
               train_ratio: 256

I wonder if the old version of DreamerV3 (in 2023) used train_ratio 1024, as specified in the old version of the paper (https://arxiv.org/pdf/2301.04104v1), whereas the new version of DreamerV3 (2024) uses a train_ratio of 256. However, when I attempted to reproduce Pong just now with the default config value of 256, my agent was stuck at -21 score as in #154.

Old version of DreamerV3:

dreamerv3/dreamerv3/configs.yaml

Lines 135 to 146 in 84ecf19

    
           atari100k: 
        
             task: atari_pong 
        
             envs: {amount: 1} 
        
             env.atari: {gray: False, repeat: 4, sticky: False, noops: 30, actions: needed} 
        
             run: 
        
               script: train_eval 
        
               steps: 1.5e5 
        
               eval_every: 1e5 
        
               eval_initial: False 
        
               eval_eps: 100 
        
               train_ratio: 1024

rsun0 · 2025-01-21T06:09:39Z

@George614 I think I figured out the issue. The task config should be "atari100k_pong" instead of "atari_pong". #173

I am leaving the train_ratio as 256 as set by the atari100k config for now.

danijar · 2025-01-29T20:53:49Z

The issue could also be that you're using --script=train_eval, whereas DreamerV3 reports training scores for everything. So you can try using --script=train or --script=parallel instead.

rsun0 · 2025-01-30T06:06:22Z

@danijar Thanks so much for replying. I see, so the results reported in the paper are the final training scores after 400,000 steps? I noticed that script=train does not report a score exactly at step 400,000, since episodes are of varying length, so did you take the training score from the first step above 400,000? I also noticed that the default configs run scrip=train for 440,000 steps, so do you take the final training score after 440,000 steps instead?

Also, does this mean that the scores reported in the paper are from a single training episode? Or is the reported score an average over multiple training episodes?

danijar · 2025-01-30T07:36:31Z

The reported scores are the average episode returns within the last 10k steps, that is, all episodes that finished between 390k and 400k environment frames. The scores after that are not included in the results.

rsun0 mentioned this issue Jan 16, 2025

Should train_ratio of Atari 200M be 128? #154

Closed

rsun0 mentioned this issue Jan 24, 2025

Atari100k Pong settings #175

Open

danijar changed the title ~~Pong results do not match paper~~ Settings for Pong Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Settings for Pong #138

Settings for Pong #138

George614 commented Jun 5, 2024

IcarusWizard commented Jun 11, 2024

NonsansWD commented Jun 16, 2024

George614 commented Jun 19, 2024

George614 commented Jun 19, 2024

NonsansWD commented Jun 23, 2024

rsun0 commented Jan 16, 2025 •

edited

Loading

rsun0 commented Jan 21, 2025

danijar commented Jan 29, 2025

rsun0 commented Jan 30, 2025

danijar commented Jan 30, 2025

Settings for Pong #138

Settings for Pong #138

Comments

George614 commented Jun 5, 2024

IcarusWizard commented Jun 11, 2024

NonsansWD commented Jun 16, 2024

George614 commented Jun 19, 2024

George614 commented Jun 19, 2024

NonsansWD commented Jun 23, 2024

rsun0 commented Jan 16, 2025 • edited Loading

rsun0 commented Jan 21, 2025

danijar commented Jan 29, 2025

rsun0 commented Jan 30, 2025

danijar commented Jan 30, 2025

rsun0 commented Jan 16, 2025 •

edited

Loading