Skip to content

TensorFlow implementation for our paper "Learning Long-Term Reward Redistribution via Randomized Return Decomposition"

License

Notifications You must be signed in to change notification settings

Stilwell-Git/Randomized-Return-Decomposition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Randomized Return Decomposition (RRD)

This is a TensorFlow implementation for our paper Learning Long-Term Reward Redistribution via Randomized Return Decomposition accepted by ICLR 2022.

Requirements

  1. Python 3.6.13
  2. gym == 0.18.3
  3. TensorFlow == 1.12.0
  4. BeautifulTable == 0.8.0
  5. opencv-python == 4.5.3.56

Running Commands

Run the following commands to reproduce our main results shown in section 4.1.

python train.py --tag='RRD Ant-v2' --alg=rrd --basis_alg=sac --env=Ant-v2
python train.py --tag='RRD-L(RD) Ant-v2' --alg=rrd --basis_alg=sac --rrd_bias_correction=True --env=Ant-v2

The following commands to switch the back-end algorithm of RRD.

python train.py --tag='RRD-TD3 Ant-v2' --alg=rrd --basis_alg=td3 --env=Ant-v2
python train.py --tag='RRD-DDPG Ant-v2' --alg=rrd --basis_alg=ddpg --env=Ant-v2

We include an unofficial implementation of IRCR for the ease of baseline comparison.
Please refer to tgangwani/GuidanceRewards for the official implementation of IRCR.

python train.py --tag='IRCR-SAC Ant-v2' --alg=ircr --basis_alg=sac --env=Ant-v2
python train.py --tag='IRCR-TD3 Ant-v2' --alg=ircr --basis_alg=td3 --env=Ant-v2
python train.py --tag='IRCR-DDPG Ant-v2' --alg=ircr --basis_alg=ddpg --env=Ant-v2

The following commands support the experiments on Atari games with episodic rewards.

python train.py --tag='RRD-DQN Assault' --alg=rrd --basis_alg=dqn --env=Assault
python train.py --tag='IRCR-DQN Assault' --alg=ircr --basis_alg=dqn --env=Assault

Note: The implementation of RRD upon DQN on the Atari benchmark has not been well tuned. We release this interface only for the ease of future studies.

About

TensorFlow implementation for our paper "Learning Long-Term Reward Redistribution via Randomized Return Decomposition"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages