Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No policy for actions that are tuples of discretes #2184

Open
sdfzz opened this issue Nov 25, 2020 · 3 comments
Open

No policy for actions that are tuples of discretes #2184

sdfzz opened this issue Nov 25, 2020 · 3 comments

Comments

@sdfzz
Copy link

sdfzz commented Nov 25, 2020

Hi,

I would like to use an array of integers as action of my custom environment. For example, I want my action to be like [1 2 4].

I'm using PPO, and I guess I have to use Categorical policy to generate discrete actions. However, by defining action space as gym.spaces.Discrete(), my action is limited to a single integer.

I've tried to use gym.spaces.Tuple((gym.spaces.Discrete(), ...)) as my action space, but Garage says 'CategoricalMLPPolicy only works with akro.Discrete action space'.

Is there any method to use an array of integers as the action?

Any help would be greatly appreciated

Sincerely,

Steve

@sdfzz
Copy link
Author

sdfzz commented Nov 25, 2020

After searching Google, I've found a similar discussion here:

rlworkgroup/akro#3

It seems like the function I want is gym.spaces.Discrete(), which is currently NOT supported by garage...

Is there any plan to update on this? or, is there any workaround?

Thanks,

Steve

@krzentner
Copy link
Contributor

krzentner commented Nov 30, 2020

Hi Steve,

Sorry for the slow response. What you want to do should be possible using a Tuple space of Discrete spaces, as you've mentioned. However, the existing policies were not designed to output that shape. In particular, they're all coded to output a OneHotCategorical using a softmax over a single vector of logits. If you want to handle an environment with this shape of action, you would have to write a custom policy (which should be fairly easy, looking at the existing CategoricalMLPPolicy). Essentially you would need to change it to take the action space in the constructor, apply the softmax manually, then use that to construct a tfp.distributions.JointDistribution in _build.

Alternatively, you could modify or wrap the environment to consider each combination of discrete choices to be an independent discrete choice. In other words, flatten the action from a Tuple down to a single Discrete with a number of elements equal to the product of the elements of the Tuple distribution.

If we received a pull request implementing a policy like I described here (with some minimal tests), we would be happy to merge it.

Cheers,

K.R.

@sdfzz
Copy link
Author

sdfzz commented Nov 30, 2020

Dear K.R,

Thank you for the response.

I can see that using 'multidiscrete' as an action is a really unusual choice...too bad I need it for my project.

I guess I will modify my custom environment to get around this issue.

Cheers,

Steve

@krzentner krzentner changed the title Using array of integers as action No policy for actions that are tuples of discretes Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants