Skip to content

Fix total_reward calculation in technical_env_design#5664

Open
MaiXiming wants to merge 1 commit into
isaac-sim:mainfrom
MaiXiming:patch-1
Open

Fix total_reward calculation in technical_env_design#5664
MaiXiming wants to merge 1 commit into
isaac-sim:mainfrom
MaiXiming:patch-1

Conversation

@MaiXiming
Copy link
Copy Markdown

@MaiXiming MaiXiming commented May 17, 2026

Removed 'keepdim=True' from total_reward calculation.

Reason:

The training crashes immediately with:

RuntimeError: output with shape [100, 1] doesn't match the broadcast shape [100, 100]

Location: rsl_rl/algorithms/ppo.py, inside process_env_step at the line:
self.transition.rewards += self.gamma * torch.squeeze(...)

Root Cause

_get_rewards() returns a tensor of shape [num_envs, 1] due to keepdim=True:

def _get_rewards(self) -> torch.Tensor:
total_reward = torch.linalg.norm(self.velocity, dim=-1, keepdim=True) # shape: [N, 1]
return total_reward

rsl_rl expects rewards of shape [num_envs] (1-D). The extra dimension causes a broadcast failure when the runner tries to accumulate rewards.

Fix

Remove keepdim=True:

total_reward = torch.linalg.norm(self.velocity, dim=-1) # shape: [N]

Environment

  • Isaacsim 5.1.0 + IsaacLab 2.3.2+ rsl_rl
  • num_envs = 100

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (existing functionality will not work without user modification)
  • Documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

Removed 'keepdim=True' from total_reward calculation.

Reason:

The training crashes immediately with:                                                                                                                                                           
                                                                                                                                                                                                                                                        
  RuntimeError: output with shape [100, 1] doesn't match the broadcast shape [100, 100]
                                                                                                                                                                                                                                                        
  Location: rsl_rl/algorithms/ppo.py, inside process_env_step at the line:                                                                                                                                                                              
  self.transition.rewards += self.gamma * torch.squeeze(...)                                                                                                                                                                                            
                                                                                                                                                                                                                                                        
  Root Cause                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                        
  _get_rewards() returns a tensor of shape [num_envs, 1] due to keepdim=True:                                                                                                                                                                           
                                                                                                                                                                                                                                                        
  def _get_rewards(self) -> torch.Tensor:                                                                                                                                                                                                               
      total_reward = torch.linalg.norm(self.velocity, dim=-1, keepdim=True)  # shape: [N, 1]                                                                                                                                                            
      return total_reward                                                                                                                                                                                                                               
                                                                                                                                                                                                                                                        
  rsl_rl expects rewards of shape [num_envs] (1-D). The extra dimension causes a broadcast failure when the runner tries to accumulate rewards.                                                                                                         
                                                                                                                                                                                                                                                        
  Fix                                                                                                                                                                                                                                                   
                  
  Remove keepdim=True:                                                                                                                                                                                                                                  
   
  total_reward = torch.linalg.norm(self.velocity, dim=-1)  # shape: [N]                                                                                                                                                                                 
                                                                                                                                                                                                                                                        
  Environment                                                                                                                                                                                                                                           
  - Isaacsim 5.1.0 + IsaacLab 2.3.2+ rsl_rl                                                                                                                                                                                                                                   
  - num_envs = 100

Signed-off-by: MaiXiming <84124453+MaiXiming@users.noreply.github.com>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 17, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation Bug Fix Review

Summary: This PR correctly fixes a bug in the tutorial code example within technical_env_design.rst.

Analysis

The fix addresses a tensor shape mismatch that causes training to crash when following the tutorial:

  • Before: torch.linalg.norm(self.velocity, dim=-1, keepdim=True) → shape [num_envs, 1]
  • After: torch.linalg.norm(self.velocity, dim=-1) → shape [num_envs]

The rsl_rl PPO implementation expects rewards as a 1-D tensor of shape [num_envs]. The keepdim=True parameter preserves an extra dimension, causing the broadcast error described in the PR:

RuntimeError: output with shape [100, 1] doesn't match the broadcast shape [100, 100]

Verdict

LGTM — This is a correct and minimal fix. The change aligns the tutorial code with the expected reward tensor shape for rsl_rl integration.

The PR author has provided excellent documentation of the issue, including:

  • The exact error message
  • Root cause analysis
  • Environment details

Minor note: The checklist items are unchecked, but for a documentation-only fix to an RST file, most items (tests, changelog, etc.) are not applicable.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 17, 2026

Greptile Summary

This PR fixes a RuntimeError crash in the technical_env_design walkthrough tutorial by removing keepdim=True from torch.linalg.norm, correcting the reward tensor shape from [num_envs, 1] to [num_envs] as expected by rsl_rl.

  • Code fix is correct: removing keepdim=True makes the returned tensor 1-D ([num_envs]), matching the shape contract expected by rsl_rl's PPO runner.
  • Stale prose not updated: line 163 in the same file still describes the reward shape as [num_envs, 1], which is now contradicted by the fixed code snippet directly above it.

Confidence Score: 4/5

The code fix itself is correct and resolves the crash, but the accompanying prose description of the reward tensor shape was not updated to match.

The keepdim=True removal is the right fix and matches the shape contract for rsl_rl. However, line 163 in the same file still tells readers the reward shape is [num_envs, 1] — directly contradicting the now-corrected code snippet. A tutorial reader following the walkthrough would see conflicting information about the tensor shape.

docs/source/setup/walkthrough/technical_env_design.rst — line 163 prose still references the old [num_envs, 1] shape.

Important Files Changed

Filename Overview
docs/source/setup/walkthrough/technical_env_design.rst Removes keepdim=True from the torch.linalg.norm code snippet to fix a shape mismatch crash, but leaves a stale [num_envs, 1] shape description in the surrounding prose on line 163.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["_get_rewards()"] --> B["torch.linalg.norm(self.velocity, dim=-1)"]
    B --> C["Shape: [num_envs]  ✅"]
    C --> D["rsl_rl PPO runner\nprocess_env_step()"]
    D --> E["transition.rewards += gamma * squeeze(...)  ✅"]

    F["OLD: keepdim=True"] --> G["Shape: [num_envs, 1]  ❌"]
    G --> H["broadcast shape mismatch  ❌"]
    H --> I["RuntimeError crash"]
Loading

Comments Outside Diff (1)

  1. docs/source/setup/walkthrough/technical_env_design.rst, line 163 (link)

    P1 Stale shape description left behind by the fix

    The prose here still says the reward tensor has shape [num_envs, 1], but removing keepdim=True from torch.linalg.norm changes the returned shape to [num_envs]. A reader following the tutorial will see contradictory information — the code snippet produces a 1-D tensor while this sentence claims it is 2-D. The text should be updated to read [num_envs].

Reviews (1): Last reviewed commit: "Fix total_reward calculation in technica..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant