[Feature] different number of actions for each action dimension (discrete actions)#119
Conversation
Changes: - add action_nvec property to the Dynamics ABC (defaults to 3s as before when not overridden) - add Agent.action_nvec - In Environment: update get_agent_action_space, get_random_action, _set_action to support Agent.action_nvec
Changes: - add simulator.dynamics.composite with Composite class - add Rotation to simulator.dynamics.holonomic_with_rot
There was a problem hiding this comment.
Wow thanks for this, it is really well executed!
it is something i also have been thinking about a lo
I think we definitely need to allow different discretizations other than 3, but we have to be careful about bc-compatibility here.
Also another point is that I would like to keep the core and dynamic agnostic of wether the input was discrete or continuous. This might not be possible for the agent (as we need to somehow tell how we want to discretize actions) but it is possible with dynamics. In particular:
- I do not think the dynamics need to know how many discrete actions the action is divided into (and even if they do they can access it via the agent). So I would have the nvec property just as part of the agent and not also of the dynamics
- The way the dynamics are designed is related to motion, so it does not make sense to have more than one. This is why HolonomicWithRotation exists and it is not Holonomic+Rotation. If you need to preprocess certain actions (e.g., for a gripper) I would create a custom component that you call in
process_actionwhich consumes the actions it needs leaving the ones that will later go to the dynamics
To recap, I would:
- remove the changes to the dynamics (we can keep the rotation only dynamics in its own file)
- Think a bit more about the bc-comp implications of the changes in the env, probably
interactive_renderhas to be adjusted
We will also need some tests for the env changes
vmas/simulator/core.py
Outdated
| if action_nvec is not None: | ||
| action_nvec = list(action_nvec) | ||
| if action_nvec is not None and action_size is None: | ||
| action_nvec = len(action_nvec) |
There was a problem hiding this comment.
| action_nvec = len(action_nvec) | |
| action_size = len(action_nvec) |
vmas/simulator/core.py
Outdated
| # if action_size is not None and action_nvec is not None: | ||
| # raise ValueError(f'Cannot specify both action_size and action_nvec') |
vmas/simulator/core.py
Outdated
| if action_nvec is not None: | ||
| action_nvec = list(action_nvec) |
There was a problem hiding this comment.
Why is this needed? We are expecting a list anyway
vmas/simulator/core.py
Outdated
| render_action: bool = False, | ||
| dynamics: Dynamics = None, # Defaults to holonomic | ||
| action_size: int = None, # Defaults to what required by the dynamics | ||
| action_nvec: list = None, # Defaults to what required by the dynamics |
There was a problem hiding this comment.
| action_nvec: list = None, # Defaults to what required by the dynamics | |
| action_nvec: List = None, # Defaults to what required by the dynamics |
| ) | ||
| elif self.multidiscrete_actions: | ||
| actions = [3] * agent.action_size + ( | ||
| nvec = list(agent.action_nvec) + ( |
There was a problem hiding this comment.
Isn't it guaranteed to be a list?
| else: | ||
| return spaces.Discrete( | ||
| 3**agent.action_size | ||
| reduce(lambda x, y: x * y, agent.action_nvec, 1) |
There was a problem hiding this comment.
| reduce(lambda x, y: x * y, agent.action_nvec, 1) | |
| math.prod(agent.action_nvec) |
| elif self.multidiscrete_actions: | ||
| # from https://github.com/pytorch/pytorch/issues/89438#issuecomment-1862363360 | ||
| def randint(low, high, size, device): | ||
| return ( | ||
| torch.randint(2**63 - 1, size=size, device=device) % (high - low) | ||
| + low | ||
| ) | ||
|
|
||
| nvec_tensor = torch.tensor(self.get_agent_action_space(agent).nvec) | ||
| if torch.any(nvec_tensor > 2**63 - 1): | ||
| raise ValueError( | ||
| "Multidiscrete action spaces with nvec > 2**63 - 1 are not supported" | ||
| ) | ||
| action = randint( | ||
| low=0, | ||
| high=nvec_tensor, | ||
| size=(agent.batch_dim, len(nvec_tensor)), | ||
| device=agent.device, | ||
| ) |
There was a problem hiding this comment.
This has been updated in main, merge that
| if not isinstance(high, Tensor): | ||
| high = torch.tensor(high, device=self.device) |
There was a problem hiding this comment.
If we do it for high we should do it for low. btw , why is this needed?
| # This is done by iteratively taking the modulo of the action and dividing by the | ||
| # number of actions in the current action space, which treats the action as if | ||
| # it was the "flat index" of the multi-discrete actions. E.g. if we have | ||
| # action spaces [2, 3, 4], action 0 corresponds to the actions [0, 0, 0], | ||
| # action 1 corresponds to the action [1, 0, 0], action 2 corresponds | ||
| # to the action [0, 1, 0], action 3 corresponds to the action [1, 1, 0], etc. | ||
| flat_action = action.squeeze(-1) | ||
| actions = [] | ||
| nvec = list(agent.action_nvec) + ( | ||
| [self.world.dim_c] | ||
| if not agent.silent and self.world.dim_c != 0 | ||
| else [] | ||
| ) | ||
| action = action_range.nonzero()[:, 1:] | ||
| for n in nvec: | ||
| actions.append(flat_action % n) | ||
| flat_action = flat_action // n | ||
| action = torch.stack(actions, dim=-1) |
There was a problem hiding this comment.
This changes the logic: before we had 0-> [0,0,0], 1-> [0,0,1], 2-> [0,0,2], 3-> [0,1,0], 4-> [0,1,1] and so on
| u_max = agent.action.u_range_tensor[action_index] | ||
| # We know u must be in [-u_max, u_max], and we know action is | ||
| # in [0, n-1]. Conversion steps: [0, n-1] -> [0, 1] -> [0, 2*u_max] -> [-u_max, u_max] | ||
| # E.g. action 0 -> -u_max, action n-1 -> u_max, action 1 -> -u_max + 2*u_max/(n-1) | ||
| agent.action.u[:, action_index] = (physical_action / (n - 1)) * ( | ||
| 2 * u_max | ||
| ) - u_max |
There was a problem hiding this comment.
This also changes the logic as before we had action 0 always being no-op followed by a -u_range action and a +u_range action.
|
Thank you for the comments and clarifications! I will address them and continue working on this as soon as I can. It makes a lot of sense to keep the dynamics agnostic of the action space. To keep compatibility with the current action spaces, it should be possible to rewrite the multidiscrete -> continuous conversion in At this point, the overall changes would be:
I will also look into interactive_render (I initially left it untouched because it uses continuous actions). Let me know if there is other functionality that you think these changes might break. |
Yes, you basically have already done this. I would not make them mutually exclusive. They will default to Also maybe it is worth naming this with something that contains "discrete" like
yes, merge main and this should come almost for free with what you have already done
so here is where i am still a bit undecided. For example:
Yeah thanks. the main tests I would like to see is regarding the logic that will be added to
yep!
Oh yes sorry I forgot interactive_render uses continuous, then we should be good there |
|
I added 2 commits with the suggested changes (and merged main):
Overall, this results in the following "multidiscrete
I am now working on adding tests that check the above logic and the discrete-multidiscrete conversion, and I was wondering if it is possible to build a from vmas import scenarios
...
@pytest.mark.parametrize("scenario", scenario_names())
def test_discrete_action_nvec_multidiscrete(scenario, num_envs=10, n_steps=10):
scenario = scenarios.load(scenario).Scenario()
random.seed(0)
for agent in scenario.agents:
agent.discrete_action_nvec = [random.randint(2, 6) for _ in range(agent.action_size)]
env = make_env(
scenario=scenario,
num_envs=num_envs,
seed=0,
multidiscrete_actions=True,
continuous_actions=False,
)
for _ in range(n_steps):
...However, I get an error on the first line of the function, saying that |
|
Fantastic work! It seems to be exactly what we want
if you do What you want is the module: @pytest.mark.parametrize("scenario", scenario_names())
def test_prova(scenario, num_envs=10, n_steps=10):
from vmas.scenarios import load
scenario_class = load(f"{scenario}.py").Scenario()
env = make_env(
scenario=scenario_class,
num_envs=num_envs,
seed=0,
multidiscrete_actions=True,
continuous_actions=False,
)
for _ in range(n_steps):
env.step(env.get_random_actions())this will work |
matteobettini
left a comment
There was a problem hiding this comment.
Amazing!
For the tests let's make sure we test:
discreteandmultidiscreteaction spaces- different nvecs with all the edge cases
- the sampling logic
but judging from the work so far i am confident the tests are gonna be strong
|
should this even be allowed? I would introduce a check that reads the nvec and checks all values are >= 2 |
| # This is done by iteratively taking the modulo of the action and dividing by the | ||
| # number of actions in the current action space, which treats the action as if | ||
| # it was the "flat index" of the multi-discrete actions. E.g. if we have | ||
| # nvec = [3,2], action 0 corresponds to the actions [0,0], | ||
| # action 1 corresponds to the action [0,1], action 2 corresponds | ||
| # to the action [1,0], action 3 corresponds to the action [1,1], etc. | ||
| flat_action = action.squeeze(-1) | ||
| actions = [] | ||
| nvec = list(agent.discrete_action_nvec) + ( | ||
| [self.world.dim_c] | ||
| if not agent.silent and self.world.dim_c != 0 | ||
| else [] | ||
| ) | ||
| action = action_range.nonzero()[:, 1:] | ||
| for n in nvec: | ||
| actions.append(flat_action % n) | ||
| flat_action = flat_action // n | ||
| action = torch.stack(actions, dim=-1) |
There was a problem hiding this comment.
There also seems to be a little inconsistency with this bit
with nvec=[3,3] and action=tensor([7, 6, 3, 0, 1, 7, 5, 6, 3, 7, 4, 4, 7, 7, 1, 3, 2, 3, 6, 3, 0, 6, 8, 0, 3, 2, 2, 3, 4, 5, 8, 8])
i am getting
PR
tensor([[1, 2],
[0, 2],
[0, 1],
[0, 0],
[1, 0],
[1, 2],
[2, 1],
[0, 2],
[0, 1],
[1, 2],
[1, 1],
[1, 1],
[1, 2],
[1, 2],
[1, 0],
[0, 1],
[2, 0],
[0, 1],
[0, 2],
[0, 1],
[0, 0],
[0, 2],
[2, 2],
[0, 0],
[0, 1],
[2, 0],
[2, 0],
[0, 1],
[1, 1],
[2, 1],
[2, 2],
[2, 2]])
vmas main
tensor([[2, 1],
[2, 0],
[1, 0],
[0, 0],
[0, 1],
[2, 1],
[1, 2],
[2, 0],
[1, 0],
[2, 1],
[1, 1],
[1, 1],
[2, 1],
[2, 1],
[0, 1],
[1, 0],
[0, 2],
[1, 0],
[2, 0],
[1, 0],
[0, 0],
[2, 0],
[2, 2],
[0, 0],
[1, 0],
[0, 2],
[0, 2],
[1, 0],
[1, 1],
[1, 2],
[2, 2],
[2, 2]])
There was a problem hiding this comment.
Thanks for catching this! I fixed it and added relevant tests
Co-authored-by: Matteo Bettini <55539777+matteobettini@users.noreply.github.com>
|
Thanks for the comments! I've committed the suggested changes and fixed the bug in the discrete -> multi-discrete conversion
Yes, this should not happen, I added an input sanity check in To make sure all of this works as intended, I've added some tests that check:
These are run for all scenarios that don't explicitly override |
matteobettini
left a comment
There was a problem hiding this comment.
Stellar!
Love the tests and can't wait to merge this.
Last few comments on tests
tests/test_vmas.py
Outdated
| scenario=scenario, | ||
| num_envs=num_envs, | ||
| seed=0, | ||
| multidiscrete_actions=True, |
There was a problem hiding this comment.
Let's test both True and False here
tests/test_vmas.py
Outdated
| for agent, agent_multi in zip( | ||
| env.world.policy_agents, env_multi.world.policy_agents | ||
| ): | ||
| nvec = [random.randint(2, 6) for _ in range(agent.action_size)] |
There was a problem hiding this comment.
Should we test with different action sizes? casue most scnearios have action_size=2. For the same reason, testing this on all scenarios is not really needed as the logic is in the environment. One scenario with a parametric action_size and nvec would be enough.
There was a problem hiding this comment.
makes perfect sense. I've restricted this to the transport scenario with action_sizes in [2,6] and nvecs with elements in [2,6]
Co-authored-by: Matteo Bettini <55539777+matteobettini@users.noreply.github.com>
This PR supports specifying different number of actions for each action dimension when the action space is discrete. The motivation is to support action dimensions with different semantics that require different discretizations for each dimension (e.g. combining navigation and "boolean" actions). More in general, this should allow to add any kind of discrete action with arbitrary semantics (also thanks to process_action).
To implement this, I added an
action_nvecproperty toDynamics, which is "mutually exclusive" withneeded_action_size, meaning that if one is overridden the other is computed automatically, and vice-versa (they can't be both overridden). By default,action_nvecwill be a list of 3s of lengthaction_sizewhen not specified, to keep compatibility with existing scenarios. I also addedaction_nvecto theAgentclass, which is then used inEnvironmentto build the discrete action space(s) and inEnvironment._set_actionto: 1) convert discrete actions in the interval [0, n) to [-u_max, u_max]; and 2) convert discrete actions to multi-discrete ones when needed.I also added a
Compositesubclass ofDynamicsthat allows to compose multipleDynamicsinto one by concatenating their actions, which should make it more convenient (modular) to define complex actions/dynamics.Let me know if there is anything else that can improve the PR.