-
Notifications
You must be signed in to change notification settings - Fork 78
Description
Question
English Version
Hi InternData team,
Thank you for releasing an excellent dataset! I'm currently using InternData-N1 to build my own dataset and have some detailed questions I'd like to discuss with you regarding the VLN-CE subset.
1. Semantic Meaning of 2D Points in InternVLA-N1-System2 for VLN-CE Subset
- What is the semantic meaning of the 2D points output by InternVLA-N1-System2 in the VLN-CE subset? Specifically:
- Do these 2D points represent directional judgments corresponding to language instructions relative to the image centre (current viewpoint)?
- Or do they guide the robot to walk toward a specific object in the image?
In other words, what was the conceptual design intention when collecting data for predicting these point locations?
Additionally:
- How were appropriate 3D mapping points selected during the filtering process?
- How do you ensure that the final 2D mapped points align with the textual instruction requirements?
2. Handling Invisible Projected Points
During the 3D to 2D projection, do you encounter situations where all predicted projection points for the current frame are invisible/not in view?
- How are such cases handled? Are these frames directly discarded?
- What if the current frame is a critical navigation frame (e.g., at a path turning point)?
3. Size of Each Subset
While the total dataset size is officially provided, could you please share the approximate size of each subset (not the mini version, but the full dataset)?
中文版本
你好 InternData 团队,
感谢你们发布的优秀数据集!我目前正在使用 InternData-N1 来构建自己的数据集,关于 VLN-CE 子集有一些细节问题希望能与你们讨论。
问题
1. VLN-CE 子集中 InternVLA-N1-System2 输出 2D 点的语义含义
- VLN-CE 数据子集中的 InternVLA-N1-System2 输出的 2D 点的意义是什么?具体来说:
- 这些 2D 点是否代表相对于图像中心(当前视角)的、与语言指令对应的方向性判断?
- 还是指导机器人走向图中某个特定物体?
换句话说,在设计预测这个点位时,数据采集选择的概念意义是什么?
此外:
- 在筛选 3D 映射点的时候是如何选择合适点的?
- 最终筛选出的 3D 点如何确保它的 2D 映射点符合文本指令的对齐需求?
2. 关于不可见投影点的处理
在进行 3D 到 2D 投影时,是否会遇到当前帧对应的所有预测投影点都不可见的情况?
- 这样的情况下是如何处理的呢?是直接丢弃该帧吗?
- 若该帧是处于路径转折点等关键导航帧,又是如何处理的?
3. 各子数据集的大小
官方已经给出了总数据集大小,但能否提供每个子数据集的大概大小(不是 mini 版本,而是完整数据集)?