Fix total_reward calculation in technical_env_design#5664
Conversation
Removed 'keepdim=True' from total_reward calculation.
Reason:
The training crashes immediately with:
RuntimeError: output with shape [100, 1] doesn't match the broadcast shape [100, 100]
Location: rsl_rl/algorithms/ppo.py, inside process_env_step at the line:
self.transition.rewards += self.gamma * torch.squeeze(...)
Root Cause
_get_rewards() returns a tensor of shape [num_envs, 1] due to keepdim=True:
def _get_rewards(self) -> torch.Tensor:
total_reward = torch.linalg.norm(self.velocity, dim=-1, keepdim=True) # shape: [N, 1]
return total_reward
rsl_rl expects rewards of shape [num_envs] (1-D). The extra dimension causes a broadcast failure when the runner tries to accumulate rewards.
Fix
Remove keepdim=True:
total_reward = torch.linalg.norm(self.velocity, dim=-1) # shape: [N]
Environment
- Isaacsim 5.1.0 + IsaacLab 2.3.2+ rsl_rl
- num_envs = 100
Signed-off-by: MaiXiming <84124453+MaiXiming@users.noreply.github.com>
There was a problem hiding this comment.
Documentation Bug Fix Review
Summary: This PR correctly fixes a bug in the tutorial code example within technical_env_design.rst.
Analysis
The fix addresses a tensor shape mismatch that causes training to crash when following the tutorial:
- Before:
torch.linalg.norm(self.velocity, dim=-1, keepdim=True)→ shape[num_envs, 1] - After:
torch.linalg.norm(self.velocity, dim=-1)→ shape[num_envs]
The rsl_rl PPO implementation expects rewards as a 1-D tensor of shape [num_envs]. The keepdim=True parameter preserves an extra dimension, causing the broadcast error described in the PR:
RuntimeError: output with shape [100, 1] doesn't match the broadcast shape [100, 100]
Verdict
✅ LGTM — This is a correct and minimal fix. The change aligns the tutorial code with the expected reward tensor shape for rsl_rl integration.
The PR author has provided excellent documentation of the issue, including:
- The exact error message
- Root cause analysis
- Environment details
Minor note: The checklist items are unchecked, but for a documentation-only fix to an RST file, most items (tests, changelog, etc.) are not applicable.
Greptile SummaryThis PR fixes a
Confidence Score: 4/5The code fix itself is correct and resolves the crash, but the accompanying prose description of the reward tensor shape was not updated to match. The docs/source/setup/walkthrough/technical_env_design.rst — line 163 prose still references the old Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["_get_rewards()"] --> B["torch.linalg.norm(self.velocity, dim=-1)"]
B --> C["Shape: [num_envs] ✅"]
C --> D["rsl_rl PPO runner\nprocess_env_step()"]
D --> E["transition.rewards += gamma * squeeze(...) ✅"]
F["OLD: keepdim=True"] --> G["Shape: [num_envs, 1] ❌"]
G --> H["broadcast shape mismatch ❌"]
H --> I["RuntimeError crash"]
|
Removed 'keepdim=True' from total_reward calculation.
Reason:
The training crashes immediately with:
RuntimeError: output with shape [100, 1] doesn't match the broadcast shape [100, 100]
Location: rsl_rl/algorithms/ppo.py, inside process_env_step at the line:
self.transition.rewards += self.gamma * torch.squeeze(...)
Root Cause
_get_rewards() returns a tensor of shape [num_envs, 1] due to keepdim=True:
def _get_rewards(self) -> torch.Tensor:
total_reward = torch.linalg.norm(self.velocity, dim=-1, keepdim=True) # shape: [N, 1]
return total_reward
rsl_rl expects rewards of shape [num_envs] (1-D). The extra dimension causes a broadcast failure when the runner tries to accumulate rewards.
Fix
Remove keepdim=True:
total_reward = torch.linalg.norm(self.velocity, dim=-1) # shape: [N]
Environment
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there