-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No policy for actions that are tuples of discretes #2184
Comments
After searching Google, I've found a similar discussion here: It seems like the function I want is gym.spaces.Discrete(), which is currently NOT supported by garage... Is there any plan to update on this? or, is there any workaround? Thanks, Steve |
Hi Steve, Sorry for the slow response. What you want to do should be possible using a Tuple space of Discrete spaces, as you've mentioned. However, the existing policies were not designed to output that shape. In particular, they're all coded to output a OneHotCategorical using a softmax over a single vector of logits. If you want to handle an environment with this shape of action, you would have to write a custom policy (which should be fairly easy, looking at the existing CategoricalMLPPolicy). Essentially you would need to change it to take the action space in the constructor, apply the softmax manually, then use that to construct a Alternatively, you could modify or wrap the environment to consider each combination of discrete choices to be an independent discrete choice. In other words, flatten the action from a Tuple down to a single Discrete with a number of elements equal to the product of the elements of the Tuple distribution. If we received a pull request implementing a policy like I described here (with some minimal tests), we would be happy to merge it. Cheers, K.R. |
Dear K.R, Thank you for the response. I can see that using 'multidiscrete' as an action is a really unusual choice...too bad I need it for my project. I guess I will modify my custom environment to get around this issue. Cheers, Steve |
Hi,
I would like to use an array of integers as action of my custom environment. For example, I want my action to be like [1 2 4].
I'm using PPO, and I guess I have to use Categorical policy to generate discrete actions. However, by defining action space as gym.spaces.Discrete(), my action is limited to a single integer.
I've tried to use gym.spaces.Tuple((gym.spaces.Discrete(), ...)) as my action space, but Garage says 'CategoricalMLPPolicy only works with akro.Discrete action space'.
Is there any method to use an array of integers as the action?
Any help would be greatly appreciated
Sincerely,
Steve
The text was updated successfully, but these errors were encountered: