Fix total_reward calculation in technical_env_design by MaiXiming · Pull Request #5664 · isaac-sim/IsaacLab

MaiXiming · 2026-05-17T17:48:58Z

Removed 'keepdim=True' from total_reward calculation.

Reason:

The training crashes immediately with:

RuntimeError: output with shape [100, 1] doesn't match the broadcast shape [100, 100]

Location: rsl_rl/algorithms/ppo.py, inside process_env_step at the line:
self.transition.rewards += self.gamma * torch.squeeze(...)

Root Cause

_get_rewards() returns a tensor of shape [num_envs, 1] due to keepdim=True:

def _get_rewards(self) -> torch.Tensor:
total_reward = torch.linalg.norm(self.velocity, dim=-1, keepdim=True) # shape: [N, 1]
return total_reward

rsl_rl expects rewards of shape [num_envs] (1-D). The extra dimension causes a broadcast failure when the runner tries to accumulate rewards.

Fix

Remove keepdim=True:

total_reward = torch.linalg.norm(self.velocity, dim=-1) # shape: [N]

Environment

Isaacsim 5.1.0 + IsaacLab 2.3.2+ rsl_rl
num_envs = 100

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (existing functionality will not work without user modification)
Documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

Removed 'keepdim=True' from total_reward calculation. Reason: The training crashes immediately with: RuntimeError: output with shape [100, 1] doesn't match the broadcast shape [100, 100] Location: rsl_rl/algorithms/ppo.py, inside process_env_step at the line: self.transition.rewards += self.gamma * torch.squeeze(...) Root Cause _get_rewards() returns a tensor of shape [num_envs, 1] due to keepdim=True: def _get_rewards(self) -> torch.Tensor: total_reward = torch.linalg.norm(self.velocity, dim=-1, keepdim=True) # shape: [N, 1] return total_reward rsl_rl expects rewards of shape [num_envs] (1-D). The extra dimension causes a broadcast failure when the runner tries to accumulate rewards. Fix Remove keepdim=True: total_reward = torch.linalg.norm(self.velocity, dim=-1) # shape: [N] Environment - Isaacsim 5.1.0 + IsaacLab 2.3.2+ rsl_rl - num_envs = 100 Signed-off-by: MaiXiming <84124453+MaiXiming@users.noreply.github.com>

isaaclab-review-bot

Documentation Bug Fix Review

Summary: This PR correctly fixes a bug in the tutorial code example within technical_env_design.rst.

Analysis

The fix addresses a tensor shape mismatch that causes training to crash when following the tutorial:

Before: torch.linalg.norm(self.velocity, dim=-1, keepdim=True) → shape [num_envs, 1]
After: torch.linalg.norm(self.velocity, dim=-1) → shape [num_envs]

The rsl_rl PPO implementation expects rewards as a 1-D tensor of shape [num_envs]. The keepdim=True parameter preserves an extra dimension, causing the broadcast error described in the PR:

RuntimeError: output with shape [100, 1] doesn't match the broadcast shape [100, 100]

Verdict

✅ LGTM — This is a correct and minimal fix. The change aligns the tutorial code with the expected reward tensor shape for rsl_rl integration.

The PR author has provided excellent documentation of the issue, including:

The exact error message
Root cause analysis
Environment details

Minor note: The checklist items are unchecked, but for a documentation-only fix to an RST file, most items (tests, changelog, etc.) are not applicable.

greptile-apps · 2026-05-17T17:50:20Z

Greptile Summary

This PR fixes a RuntimeError crash in the technical_env_design walkthrough tutorial by removing keepdim=True from torch.linalg.norm, correcting the reward tensor shape from [num_envs, 1] to [num_envs] as expected by rsl_rl.

Code fix is correct: removing keepdim=True makes the returned tensor 1-D ([num_envs]), matching the shape contract expected by rsl_rl's PPO runner.
Stale prose not updated: line 163 in the same file still describes the reward shape as [num_envs, 1], which is now contradicted by the fixed code snippet directly above it.

Confidence Score: 4/5

The code fix itself is correct and resolves the crash, but the accompanying prose description of the reward tensor shape was not updated to match.

The keepdim=True removal is the right fix and matches the shape contract for rsl_rl. However, line 163 in the same file still tells readers the reward shape is [num_envs, 1] — directly contradicting the now-corrected code snippet. A tutorial reader following the walkthrough would see conflicting information about the tensor shape.

docs/source/setup/walkthrough/technical_env_design.rst — line 163 prose still references the old [num_envs, 1] shape.

Important Files Changed

Filename	Overview
docs/source/setup/walkthrough/technical_env_design.rst	Removes `keepdim=True` from the `torch.linalg.norm` code snippet to fix a shape mismatch crash, but leaves a stale `[num_envs, 1]` shape description in the surrounding prose on line 163.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_get_rewards()"] --> B["torch.linalg.norm(self.velocity, dim=-1)"]
    B --> C["Shape: [num_envs]  ✅"]
    C --> D["rsl_rl PPO runner\nprocess_env_step()"]
    D --> E["transition.rewards += gamma * squeeze(...)  ✅"]

    F["OLD: keepdim=True"] --> G["Shape: [num_envs, 1]  ❌"]
    G --> H["broadcast shape mismatch  ❌"]
    H --> I["RuntimeError crash"]

Comments Outside Diff (1)

docs/source/setup/walkthrough/technical_env_design.rst, line 163 (link)

Stale shape description left behind by the fix

The prose here still says the reward tensor has shape [num_envs, 1], but removing keepdim=True from torch.linalg.norm changes the returned shape to [num_envs]. A reader following the tutorial will see contradictory information — the code snippet produces a 1-D tensor while this sentence claims it is 2-D. The text should be updated to read [num_envs].

_{Reviews (1): Last reviewed commit: "Fix total_reward calculation in technica..." | Re-trigger Greptile}

MaiXiming requested review from Mayankm96, jtigue-bdai and kellyguo11 as code owners May 17, 2026 17:48

github-actions Bot added the documentation Improvements or additions to documentation label May 17, 2026

isaaclab-review-bot Bot reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix total_reward calculation in technical_env_design#5664

Fix total_reward calculation in technical_env_design#5664
MaiXiming wants to merge 1 commit into
isaac-sim:mainfrom
MaiXiming:patch-1

MaiXiming commented May 17, 2026 •

edited

Loading

Uh oh!

isaaclab-review-bot Bot left a comment

Uh oh!

greptile-apps Bot commented May 17, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaiXiming commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Screenshots

Checklist

Uh oh!

isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

Documentation Bug Fix Review

Analysis

Verdict

Uh oh!

greptile-apps Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MaiXiming commented May 17, 2026 •

edited

Loading

greptile-apps Bot commented May 17, 2026 •

edited

Loading