Add absolute_to_reative_idx for remapping indices to fix delta_timestamp bug#2490
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a bug where delta_timestamps failed when used with filtered episodes (loading only a subset of episodes). The issue occurred because query indices computed from episode metadata used absolute dataset indices, but these were being used to access a filtered dataset that only contained relative indices.
- Adds an
_absolute_to_relative_idxmapping dictionary that translates absolute dataset indices to relative positions in the filtered dataset - Applies the index mapping in
_get_query_timestampsand_query_hf_datasetmethods when querying data - Ensures the mapping is only created when episodes are filtered (
self.episodes is not None)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@michel-aractingi @jadechoghari I don't think this fix is working. Here is a minimal test case: Result: Every frame from episode 1 is marked as a pad. You can also confirm that the frame itself is invalid. If you instead set episodes to [0] or [0,1], the frames are correctly sampled. |
|
I found that Please review #3279 , thank you for your great job. |
What this does
Bug reported by community in https://discord.com/channels/1216765309076115607/1438871716792373248/1441012729342197911
When delta_timestamps are used with filtered episodes (subset of the dataset is only loaded), the code was trying to access the dataset using absolute indices from the full dataset, but the filtered dataset only contains a subset of rows.
FIX: Add an index mapping that from absolute indices to local dataset indices ONLY when a subset of the dataset is requested.
Testing
Tested with the code snipper provided by the user. Before this PR this test fails: