No policy for actions that are tuples of discretes #2184

sdfzz · 2020-11-25T02:13:49Z

Hi,

I would like to use an array of integers as action of my custom environment. For example, I want my action to be like [1 2 4].

I'm using PPO, and I guess I have to use Categorical policy to generate discrete actions. However, by defining action space as gym.spaces.Discrete(), my action is limited to a single integer.

I've tried to use gym.spaces.Tuple((gym.spaces.Discrete(), ...)) as my action space, but Garage says 'CategoricalMLPPolicy only works with akro.Discrete action space'.

Is there any method to use an array of integers as the action?

Any help would be greatly appreciated

Sincerely,

Steve

sdfzz · 2020-11-25T04:01:13Z

After searching Google, I've found a similar discussion here:

rlworkgroup/akro#3

It seems like the function I want is gym.spaces.Discrete(), which is currently NOT supported by garage...

Is there any plan to update on this? or, is there any workaround?

Thanks,

Steve

krzentner · 2020-11-30T09:45:11Z

Hi Steve,

Sorry for the slow response. What you want to do should be possible using a Tuple space of Discrete spaces, as you've mentioned. However, the existing policies were not designed to output that shape. In particular, they're all coded to output a OneHotCategorical using a softmax over a single vector of logits. If you want to handle an environment with this shape of action, you would have to write a custom policy (which should be fairly easy, looking at the existing CategoricalMLPPolicy). Essentially you would need to change it to take the action space in the constructor, apply the softmax manually, then use that to construct a tfp.distributions.JointDistribution in _build.

Alternatively, you could modify or wrap the environment to consider each combination of discrete choices to be an independent discrete choice. In other words, flatten the action from a Tuple down to a single Discrete with a number of elements equal to the product of the elements of the Tuple distribution.

If we received a pull request implementing a policy like I described here (with some minimal tests), we would be happy to merge it.

Cheers,

K.R.

sdfzz · 2020-11-30T12:00:42Z

Dear K.R,

Thank you for the response.

I can see that using 'multidiscrete' as an action is a really unusual choice...too bad I need it for my project.

I guess I will modify my custom environment to get around this issue.

Cheers,

Steve

krzentner changed the title ~~Using array of integers as action~~ No policy for actions that are tuples of discretes Nov 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No policy for actions that are tuples of discretes #2184

No policy for actions that are tuples of discretes #2184

sdfzz commented Nov 25, 2020

sdfzz commented Nov 25, 2020

krzentner commented Nov 30, 2020 •

edited

Loading

sdfzz commented Nov 30, 2020

No policy for actions that are tuples of discretes #2184

No policy for actions that are tuples of discretes #2184

Comments

sdfzz commented Nov 25, 2020

sdfzz commented Nov 25, 2020

krzentner commented Nov 30, 2020 • edited Loading

sdfzz commented Nov 30, 2020

krzentner commented Nov 30, 2020 •

edited

Loading