[Feature] different number of actions for each action dimension (discrete actions) by rikifunt · Pull Request #119 · proroklab/VectorizedMultiAgentSimulator

rikifunt · 2024-06-26T13:04:35Z

This PR supports specifying different number of actions for each action dimension when the action space is discrete. The motivation is to support action dimensions with different semantics that require different discretizations for each dimension (e.g. combining navigation and "boolean" actions). More in general, this should allow to add any kind of discrete action with arbitrary semantics (also thanks to process_action).

To implement this, I added an action_nvec property to Dynamics, which is "mutually exclusive" with needed_action_size, meaning that if one is overridden the other is computed automatically, and vice-versa (they can't be both overridden). By default, action_nvec will be a list of 3s of length action_size when not specified, to keep compatibility with existing scenarios. I also added action_nvec to the Agent class, which is then used in Environment to build the discrete action space(s) and in Environment._set_action to: 1) convert discrete actions in the interval [0, n) to [-u_max, u_max]; and 2) convert discrete actions to multi-discrete ones when needed.

I also added a Composite subclass of Dynamics that allows to compose multiple Dynamics into one by concatenating their actions, which should make it more convenient (modular) to define complex actions/dynamics.

Let me know if there is anything else that can improve the PR.

Changes: - add action_nvec property to the Dynamics ABC (defaults to 3s as before when not overridden) - add Agent.action_nvec - In Environment: update get_agent_action_space, get_random_action, _set_action to support Agent.action_nvec

Changes: - add simulator.dynamics.composite with Composite class - add Rotation to simulator.dynamics.holonomic_with_rot

matteobettini

Wow thanks for this, it is really well executed!

it is something i also have been thinking about a lo

I think we definitely need to allow different discretizations other than 3, but we have to be careful about bc-compatibility here.

Also another point is that I would like to keep the core and dynamic agnostic of wether the input was discrete or continuous. This might not be possible for the agent (as we need to somehow tell how we want to discretize actions) but it is possible with dynamics. In particular:

I do not think the dynamics need to know how many discrete actions the action is divided into (and even if they do they can access it via the agent). So I would have the nvec property just as part of the agent and not also of the dynamics
The way the dynamics are designed is related to motion, so it does not make sense to have more than one. This is why HolonomicWithRotation exists and it is not Holonomic+Rotation. If you need to preprocess certain actions (e.g., for a gripper) I would create a custom component that you call in process_action which consumes the actions it needs leaving the ones that will later go to the dynamics

To recap, I would:

remove the changes to the dynamics (we can keep the rotation only dynamics in its own file)
Think a bit more about the bc-comp implications of the changes in the env, probably interactive_render has to be adjusted

We will also need some tests for the env changes

matteobettini · 2024-06-26T13:31:59Z

vmas/simulator/core.py

+        if action_nvec is not None:
+            action_nvec = list(action_nvec)
+        if action_nvec is not None and action_size is None:
+            action_nvec = len(action_nvec)


Suggested change

action_nvec = len(action_nvec)

action_size = len(action_nvec)

matteobettini · 2024-06-26T13:32:26Z

vmas/simulator/core.py

+        # if action_size is not None and action_nvec is not None:
+        #     raise ValueError(f'Cannot specify both action_size and action_nvec')


matteobettini · 2024-06-26T13:33:08Z

vmas/simulator/core.py

+        if action_nvec is not None:
+            action_nvec = list(action_nvec)


Why is this needed? We are expecting a list anyway

matteobettini · 2024-06-26T13:33:19Z

vmas/simulator/core.py

        render_action: bool = False,
        dynamics: Dynamics = None,  # Defaults to holonomic
        action_size: int = None,  # Defaults to what required by the dynamics
+        action_nvec: list = None,  # Defaults to what required by the dynamics


Suggested change

action_nvec: list = None, # Defaults to what required by the dynamics

action_nvec: List = None, # Defaults to what required by the dynamics

matteobettini · 2024-06-26T13:40:53Z

vmas/simulator/environment/environment.py

            )
        elif self.multidiscrete_actions:
-            actions = [3] * agent.action_size + (
+            nvec = list(agent.action_nvec) + (


Isn't it guaranteed to be a list?

matteobettini · 2024-06-26T13:46:39Z

vmas/simulator/environment/environment.py

        else:
            return spaces.Discrete(
-                3**agent.action_size
+                reduce(lambda x, y: x * y, agent.action_nvec, 1)


Suggested change

reduce(lambda x, y: x * y, agent.action_nvec, 1)

math.prod(agent.action_nvec)

matteobettini · 2024-06-26T13:48:43Z

vmas/simulator/environment/environment.py

+        elif self.multidiscrete_actions:
+            # from https://github.com/pytorch/pytorch/issues/89438#issuecomment-1862363360
+            def randint(low, high, size, device):
+                return (
+                    torch.randint(2**63 - 1, size=size, device=device) % (high - low)
+                    + low
+                )
+
+            nvec_tensor = torch.tensor(self.get_agent_action_space(agent).nvec) 
+            if torch.any(nvec_tensor > 2**63 - 1):
+                raise ValueError(
+                    "Multidiscrete action spaces with nvec > 2**63 - 1 are not supported"
+                )
+            action = randint(
+                low=0,
+                high=nvec_tensor,
+                size=(agent.batch_dim, len(nvec_tensor)),
+                device=agent.device,
+            )


This has been updated in main, merge that

matteobettini · 2024-06-26T13:50:00Z

vmas/simulator/environment/environment.py

+        if not isinstance(high, Tensor):
+            high = torch.tensor(high, device=self.device)


If we do it for high we should do it for low. btw , why is this needed?

matteobettini · 2024-06-26T14:15:22Z

vmas/simulator/environment/environment.py

+                # This is done by iteratively taking the modulo of the action and dividing by the
+                # number of actions in the current action space, which treats the action as if
+                # it was the "flat index" of the multi-discrete actions. E.g. if we have
+                # action spaces [2, 3, 4], action 0 corresponds to the actions [0, 0, 0],
+                # action 1 corresponds to the action [1, 0, 0], action 2 corresponds
+                # to the action [0, 1, 0], action 3 corresponds to the action [1, 1, 0], etc.
+                flat_action = action.squeeze(-1)
+                actions = []
+                nvec = list(agent.action_nvec) + (
+                    [self.world.dim_c]
+                    if not agent.silent and self.world.dim_c != 0
+                    else []
                )
-                action = action_range.nonzero()[:, 1:]
+                for n in nvec:
+                    actions.append(flat_action % n)
+                    flat_action = flat_action // n
+                action = torch.stack(actions, dim=-1)


This changes the logic: before we had 0-> [0,0,0], 1-> [0,0,1], 2-> [0,0,2], 3-> [0,1,0], 4-> [0,1,1] and so on

matteobettini · 2024-06-26T14:20:43Z

vmas/simulator/environment/environment.py

+                u_max = agent.action.u_range_tensor[action_index]
+                # We know u must be in [-u_max, u_max], and we know action is
+                # in [0, n-1]. Conversion steps: [0, n-1] -> [0, 1] -> [0, 2*u_max] -> [-u_max, u_max]
+                # E.g. action 0 -> -u_max, action n-1 -> u_max, action 1 -> -u_max + 2*u_max/(n-1)
+                agent.action.u[:, action_index] = (physical_action / (n - 1)) * (
+                    2 * u_max
+                ) - u_max


This also changes the logic as before we had action 0 always being no-op followed by a -u_range action and a +u_range action.

rikifunt · 2024-07-09T13:58:21Z

Thank you for the comments and clarifications! I will address them and continue working on this as soon as I can.

It makes a lot of sense to keep the dynamics agnostic of the action space. To keep compatibility with the current action spaces, it should be possible to rewrite the multidiscrete -> continuous conversion in Environment._set_action so that the discretization logic stays the same for n=3, I'll look into it.

At this point, the overall changes would be:

adding the action_nvec parameter to Agent.__init__ (optional and mutually exclusive with action_size)
building discrete spaces in Environment and sampling random actions according to action_nvec
using action_nvec in Environment._set_action, keeping the current logic when n=3
adding tests where needed
Then, as you said, we can delegate any extra action semantics to process_action. Are these ok?

I will also look into interactive_render (I initially left it untouched because it uses continuous actions). Let me know if there is other functionality that you think these changes might break.

matteobettini · 2024-07-10T09:11:20Z

Thank you for the comments and clarifications! I will address them and continue working on this as soon as I can.

It makes a lot of sense to keep the dynamics agnostic of the action space. To keep compatibility with the current action spaces, it should be possible to rewrite the multidiscrete -> continuous conversion in Environment._set_action so that the discretization logic stays the same for n=3, I'll look into it.

At this point, the overall changes would be:

adding the action_nvec parameter to Agent.__init__ (optional and mutually exclusive with action_size)

Yes, you basically have already done this. I would not make them mutually exclusive. They will default to None. If one is specified, we create the other one based on that one. If both are specified, we check they are coherent. If none are specified we default to dynamics (and the old 3 way split)

Also maybe it is worth naming this with something that contains "discrete" like discrete_action_nvec to highlight that is not something that interests users that have continuous actions (which I think is majority)

building discrete spaces in Environment and sampling random actions according to action_nvec

yes, merge main and this should come almost for free with what you have already done

using action_nvec in Environment._set_action, keeping the current logic when n=3

so here is where i am still a bit undecided.
what I suggest is that we split the range among the n actions.
When n is odd, we make the first action being 0 for bc-compatibility.
This would make so that n=3 is not a special case but the way we handle odd ns

For example:

n=2 leads to [-range,range]
n=3 leads to [0,-range,range]
n=4 leads to [-range,-range/3,range/3,range]
n=5 leads to [0,-range,-range/2,range/2,range]

adding tests where needed

Yeah thanks. the main tests I would like to see is regarding the logic that will be added to environemnt._set_action in different settings

Then, as you said, we can delegate any extra action semantics to process_action. Are these ok?

yep!

I will also look into interactive_render (I initially left it untouched because it uses continuous actions). Let me know if there is other functionality that you think these changes might break.

Oh yes sorry I forgot interactive_render uses continuous, then we should be good there

rikifunt · 2024-07-16T11:19:09Z

I added 2 commits with the suggested changes (and merged main):

action_nvec is now only present in Agent and is called discrete_action_nvec (changes to dynamics are reverted)
discrete_action_nvec is not mutually exclusive with action_size, but we check that they are consistent in Agent.__init__
the logic in _set_action handles odd ns by converting the first discrete action to u = 0, and shifting the remaining "negative" actions by -1 so that the previous logic applies
the comment in the discrete to multidiscrete conversion in _set_action is now accurate (turns out the logic was already the same as before the generalization)

Overall, this results in the following "multidiscrete a" to "continuous u" conversions (where U = u_max):

when n is odd (and let h = (n-1)/2, i.e. the middle point of the discrete possibilities):
```
a == 0 ==> u == 0
a in [1, h] ==> u in [-U, -U/h]
a in [h+1, n) ==> u in [+U/h, U]
```
when n is even
```
a in [0, n] ==> u in [-U, U]
```

I am now working on adding tests that check the above logic and the discrete-multidiscrete conversion, and I was wondering if it is possible to build a Scenario object directly instead of passing a string to make_env? This would make it possible to set the nvecs to values other than 3 before constructing the Environment object, so that various values can be tested. Currently I am trying to do something like this in tests/test_vmas.py (where the load(scenario) line is taken from make_env):

from vmas import scenarios

...

@pytest.mark.parametrize("scenario", scenario_names())
def test_discrete_action_nvec_multidiscrete(scenario, num_envs=10, n_steps=10):
    scenario = scenarios.load(scenario).Scenario()
    random.seed(0)
    for agent in scenario.agents:
        agent.discrete_action_nvec = [random.randint(2, 6) for _ in range(agent.action_size)]
    env = make_env(
        scenario=scenario,
        num_envs=num_envs,
        seed=0,
        multidiscrete_actions=True,
        continuous_actions=False,
    )
    for _ in range(n_steps):
        ...

However, I get an error on the first line of the function, saying that scenarios is a list and does not have the load method. I suspect this has something to do with the python import machinery? I also tried using vmas.scenarios directly instead of importing it at the top, with the same result. From what I can see, vmas.scenarios is both a subpackage in the vmas directory and a list defined in vmas/__init__.py, so maybe this is the problem? Is there any other way to accomplish this?

matteobettini · 2024-07-16T15:53:40Z

Fantastic work!

It seems to be exactly what we want

I am now working on adding tests that check the above logic and the discrete-multidiscrete conversion, and I was wondering if it is possible to build a Scenario object directly instead of passing a string to make_env? This would make it possible to set the nvecs to values other than 3 before constructing the Environment object, so that various values can be tested. Currently I am trying to do something like this in tests/test_vmas.py (where the load(scenario) line is taken from make_env):
from vmas import scenarios

...

@pytest.mark.parametrize("scenario", scenario_names())
def test_discrete_action_nvec_multidiscrete(scenario, num_envs=10, n_steps=10):
    scenario = scenarios.load(scenario).Scenario()
    random.seed(0)
    for agent in scenario.agents:
        agent.discrete_action_nvec = [random.randint(2, 6) for _ in range(agent.action_size)]
    env = make_env(
        scenario=scenario,
        num_envs=num_envs,
        seed=0,
        multidiscrete_actions=True,
        continuous_actions=False,
    )
    for _ in range(n_steps):
        ...
However, I get an error on the first line of the function, saying that scenarios is a list and does not have the load method. I suspect this has something to do with the python import machinery? I also tried using vmas.scenarios directly instead of importing it at the top, with the same result. From what I can see, vmas.scenarios is both a subpackage in the vmas directory and a list defined in vmas/__init__.py, so maybe this is the problem? Is there any other way to accomplish this?

if you do from vmas import scenarios it will load the scenario list.

What you want is the module:

@pytest.mark.parametrize("scenario", scenario_names())
def test_prova(scenario, num_envs=10, n_steps=10):
    from vmas.scenarios import load

    scenario_class = load(f"{scenario}.py").Scenario()
    env = make_env(
        scenario=scenario_class,
        num_envs=num_envs,
        seed=0,
        multidiscrete_actions=True,
        continuous_actions=False,
    )
    for _ in range(n_steps):
        env.step(env.get_random_actions())

this will work

matteobettini

Amazing!

For the tests let's make sure we test:

discrete and multidiscrete action spaces
different nvecs with all the edge cases
the sampling logic

but judging from the work so far i am confident the tests are gonna be strong

vmas/simulator/core.py

matteobettini · 2024-07-16T16:19:46Z

n=1 is not currently handled properly and results in NaNs

should this even be allowed? I would introduce a check that reads the nvec and checks all values are >= 2

vmas/simulator/environment/environment.py

matteobettini · 2024-07-16T16:42:28Z

vmas/simulator/environment/environment.py

+                # This is done by iteratively taking the modulo of the action and dividing by the
+                # number of actions in the current action space, which treats the action as if
+                # it was the "flat index" of the multi-discrete actions. E.g. if we have
+                # nvec = [3,2], action 0 corresponds to the actions [0,0],
+                # action 1 corresponds to the action [0,1], action 2 corresponds
+                # to the action [1,0], action 3 corresponds to the action [1,1], etc.
+                flat_action = action.squeeze(-1)
+                actions = []
+                nvec = list(agent.discrete_action_nvec) + (
+                    [self.world.dim_c]
+                    if not agent.silent and self.world.dim_c != 0
+                    else []
                )
-                action = action_range.nonzero()[:, 1:]
+                for n in nvec:
+                    actions.append(flat_action % n)
+                    flat_action = flat_action // n
+                action = torch.stack(actions, dim=-1)


There also seems to be a little inconsistency with this bit

with nvec=[3,3] and action=tensor([7, 6, 3, 0, 1, 7, 5, 6, 3, 7, 4, 4, 7, 7, 1, 3, 2, 3, 6, 3, 0, 6, 8, 0, 3, 2, 2, 3, 4, 5, 8, 8])

i am getting

PR

tensor([[1, 2], [0, 2], [0, 1], [0, 0], [1, 0], [1, 2], [2, 1], [0, 2], [0, 1], [1, 2], [1, 1], [1, 1], [1, 2], [1, 2], [1, 0], [0, 1], [2, 0], [0, 1], [0, 2], [0, 1], [0, 0], [0, 2], [2, 2], [0, 0], [0, 1], [2, 0], [2, 0], [0, 1], [1, 1], [2, 1], [2, 2], [2, 2]])

vmas main

tensor([[2, 1], [2, 0], [1, 0], [0, 0], [0, 1], [2, 1], [1, 2], [2, 0], [1, 0], [2, 1], [1, 1], [1, 1], [2, 1], [2, 1], [0, 1], [1, 0], [0, 2], [1, 0], [2, 0], [1, 0], [0, 0], [2, 0], [2, 2], [0, 0], [1, 0], [0, 2], [0, 2], [1, 0], [1, 1], [1, 2], [2, 2], [2, 2]])

Thanks for catching this! I fixed it and added relevant tests

Co-authored-by: Matteo Bettini <55539777+matteobettini@users.noreply.github.com>

rikifunt · 2024-07-17T10:15:37Z

Thanks for the comments! I've committed the suggested changes and fixed the bug in the discrete -> multi-discrete conversion

n=1 is not currently handled properly and results in NaNs

should this even be allowed? I would introduce a check that reads the nvec and checks all values are >= 2

Yes, this should not happen, I added an input sanity check in Agent.__init__.

To make sure all of this works as intended, I've added some tests that check:

if randomly sampled actions are in the computed action space
if multi-discrete actions map to the expected continuous control after Environment._set_action
if discrete actions that come from the product of multi-discrete spaces are converted properly to multi-discrete actions in Environment._set_action

These are run for all scenarios that don't explicitly override process_action (since it breaks the action <-> control mapping). nvecs are randomly sampled (with a fixed seed) in the interval [2,6] to properly test mixed odd-even nvecs and odd n != 3. Let me know if there is anything else that needs testing.

matteobettini

Stellar!

Love the tests and can't wait to merge this.

Last few comments on tests

matteobettini · 2024-07-20T07:01:14Z

tests/test_vmas.py

+        scenario=scenario,
+        num_envs=num_envs,
+        seed=0,
+        multidiscrete_actions=True,


Let's test both True and False here

tests/test_vmas.py

matteobettini · 2024-07-20T07:09:00Z

tests/test_vmas.py

+    for agent, agent_multi in zip(
+        env.world.policy_agents, env_multi.world.policy_agents
+    ):
+        nvec = [random.randint(2, 6) for _ in range(agent.action_size)]


Should we test with different action sizes? casue most scnearios have action_size=2. For the same reason, testing this on all scenarios is not really needed as the logic is in the environment. One scenario with a parametric action_size and nvec would be enough.

makes perfect sense. I've restricted this to the transport scenario with action_sizes in [2,6] and nvecs with elements in [2,6]

tests/test_vmas.py

Co-authored-by: Matteo Bettini <55539777+matteobettini@users.noreply.github.com>

rikifunt added 2 commits June 26, 2024 13:53

support custom nvec for discrete actions

dea2f93

Changes: - add action_nvec property to the Dynamics ABC (defaults to 3s as before when not overridden) - add Agent.action_nvec - In Environment: update get_agent_action_space, get_random_action, _set_action to support Agent.action_nvec

add composite dynamics

672b942

Changes: - add simulator.dynamics.composite with Composite class - add Rotation to simulator.dynamics.holonomic_with_rot

matteobettini reviewed Jun 26, 2024

View reviewed changes

matteobettini added bc-breaking Something that will change the behaviour wrt previous versions enhancement New feature or request labels Jun 26, 2024

rikifunt added 2 commits July 15, 2024 09:13

Merge remote-tracking branch 'upstream/main' into custom-discretization

d19a071

revert changes to dynamics, improve logic for bc-compatibility

57525ec

matteobettini reviewed Jul 16, 2024

View reviewed changes

vmas/simulator/core.py Outdated Show resolved Hide resolved

matteobettini removed the bc-breaking Something that will change the behaviour wrt previous versions label Jul 16, 2024

matteobettini reviewed Jul 16, 2024

View reviewed changes

vmas/simulator/core.py Show resolved Hide resolved

matteobettini reviewed Jul 16, 2024

View reviewed changes

vmas/simulator/environment/environment.py Outdated Show resolved Hide resolved

matteobettini reviewed Jul 16, 2024

View reviewed changes

rikifunt and others added 2 commits July 17, 2024 08:38

Apply suggestions from code review

f1aaff6

Co-authored-by: Matteo Bettini <55539777+matteobettini@users.noreply.github.com>

fix discrete to multi-discrete mapping, add tests

c108d5b

matteobettini approved these changes Jul 20, 2024

View reviewed changes

rikifunt and others added 2 commits July 23, 2024 17:54

Apply suggestions from code review

74639a0

Co-authored-by: Matteo Bettini <55539777+matteobettini@users.noreply.github.com>

improve tests

9fe25af

matteobettini merged commit 21a3199 into proroklab:main Jul 26, 2024

	action_nvec = len(action_nvec)
	action_size = len(action_nvec)

		# if action_size is not None and action_nvec is not None:
		# raise ValueError(f'Cannot specify both action_size and action_nvec')

	action_nvec: list = None, # Defaults to what required by the dynamics
	action_nvec: List = None, # Defaults to what required by the dynamics

	reduce(lambda x, y: x * y, agent.action_nvec, 1)
	math.prod(agent.action_nvec)

		if not isinstance(high, Tensor):
		high = torch.tensor(high, device=self.device)

Conversation

rikifunt commented Jun 26, 2024

Uh oh!

matteobettini left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rikifunt commented Jul 9, 2024

Uh oh!

matteobettini commented Jul 10, 2024

Uh oh!

rikifunt commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matteobettini commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matteobettini left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

matteobettini commented Jul 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

PR

vmas main

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rikifunt commented Jul 17, 2024

Uh oh!

matteobettini left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matteobettini left a comment •

edited

Loading

rikifunt commented Jul 16, 2024 •

edited

Loading

matteobettini commented Jul 16, 2024 •

edited

Loading

matteobettini commented Jul 16, 2024 •

edited

Loading