Skip to content

Add absolute_to_reative_idx for remapping indices to fix delta_timestamp bug#2490

Merged
michel-aractingi merged 1 commit into
mainfrom
fix/relative_indices_bug
Nov 20, 2025
Merged

Add absolute_to_reative_idx for remapping indices to fix delta_timestamp bug#2490
michel-aractingi merged 1 commit into
mainfrom
fix/relative_indices_bug

Conversation

@michel-aractingi
Copy link
Copy Markdown
Contributor

What this does

Bug reported by community in https://discord.com/channels/1216765309076115607/1438871716792373248/1441012729342197911

When delta_timestamps are used with filtered episodes (subset of the dataset is only loaded), the code was trying to access the dataset using absolute indices from the full dataset, but the filtered dataset only contains a subset of rows.

FIX: Add an index mapping that from absolute indices to local dataset indices ONLY when a subset of the dataset is requested.

Testing

Tested with the code snipper provided by the user. Before this PR this test fails:

from lerobot.datasets.lerobot_dataset import LeRobotDataset
import torch
repo_id = "aractingi/droid_1.0.1"
episodes = [2, 3, 4, 5]
delta_timestamps = {
    "observation.images.exterior_1_left": [-1, 10/15, 5/15, 0]
}

dataset = LeRobotDataset(repo_id, episodes=episodes, delta_timestamps=delta_timestamps)
print(dataset)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=16, shuffle=True)
for batch in dataloader:
    print(f"Received batch index {batch['index']}", end="\r")

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a bug where delta_timestamps failed when used with filtered episodes (loading only a subset of episodes). The issue occurred because query indices computed from episode metadata used absolute dataset indices, but these were being used to access a filtered dataset that only contained relative indices.

  • Adds an _absolute_to_relative_idx mapping dictionary that translates absolute dataset indices to relative positions in the filtered dataset
  • Applies the index mapping in _get_query_timestamps and _query_hf_dataset methods when querying data
  • Ensures the mapping is only created when episodes are filtered (self.episodes is not None)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@jadechoghari jadechoghari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@michel-aractingi michel-aractingi merged commit 0f551df into main Nov 20, 2025
16 checks passed
@michel-aractingi michel-aractingi deleted the fix/relative_indices_bug branch November 20, 2025 13:05
pzal pushed a commit to pzal/lerobot that referenced this pull request Dec 2, 2025
@atyshka
Copy link
Copy Markdown
Contributor

atyshka commented Dec 9, 2025

@michel-aractingi @jadechoghari I don't think this fix is working. Here is a minimal test case:

from lerobot.datasets.lerobot_dataset import LeRobotDataset

delta_ts = {
    "observation.image":[0.0]
}
dataset = LeRobotDataset("lerobot/pusht", episodes=[1], delta_timestamps=delta_ts)

for frame in dataset:
    print(frame["observation.image_is_pad"])

Result: Every frame from episode 1 is marked as a pad. You can also confirm that the frame itself is invalid.

If you instead set episodes to [0] or [0,1], the frames are correctly sampled.

XHAKA3456 pushed a commit to XHAKA3456/lerobot that referenced this pull request Dec 12, 2025
sandhya-cb pushed a commit to sandhya-cb/lerobot-clutterbot that referenced this pull request Jan 28, 2026
lu391see pushed a commit to lu391see/lerobot_tactile that referenced this pull request Mar 24, 2026
@hello3x3
Copy link
Copy Markdown
Contributor

hello3x3 commented Apr 3, 2026

I found that _absolute_to_reative_idx materializes the entire column as Python / tensor objects becomes a significant bottleneck for large datasets. Use .to_numpy() is much faster. Have you noticed this?

Please review #3279 , thank you for your great job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants