[Question]: Meaning of 2D Point Prediction in VLN-CE Subset

### Question

## English Version

Hi InternData team,

Thank you for releasing an excellent dataset! I'm currently using InternData-N1 to build my own dataset and have some detailed questions I'd like to discuss with you regarding the VLN-CE subset.

**1. Semantic Meaning of 2D Points in InternVLA-N1-System2 for VLN-CE Subset**

1. What is the semantic meaning of the 2D points output by InternVLA-N1-System2 in the VLN-CE subset? Specifically:

  - Do these 2D points represent directional judgments corresponding to language instructions relative to the image centre (current viewpoint)?
  - Or do they guide the robot to walk toward a specific object in the image?

In other words, what was the conceptual design intention when collecting data for predicting these point locations? 
Additionally:

2.  How were appropriate 3D mapping points selected during the filtering process?
3.  How do you ensure that the final 2D mapped points align with the textual instruction requirements?

**2. Handling Invisible Projected Points**

During the 3D to 2D projection, do you encounter situations where all predicted projection points for the current frame are invisible/not in view?
  
  - How are such cases handled? Are these frames directly discarded?
  - What if the current frame is a critical navigation frame (e.g., at a path turning point)?

**3. Size of Each Subset**

While the total dataset size is officially provided, could you please share the approximate size of each subset (not the mini version, but the full dataset)?

------

## 中文版本

你好 InternData 团队，

感谢你们发布的优秀数据集！我目前正在使用 InternData-N1 来构建自己的数据集，关于 VLN-CE 子集有一些细节问题希望能与你们讨论。

### 问题

**1. VLN-CE 子集中 InternVLA-N1-System2 输出 2D 点的语义含义**

1. VLN-CE 数据子集中的 InternVLA-N1-System2 输出的 2D 点的意义是什么？具体来说：
  
  - 这些 2D 点是否代表相对于图像中心（当前视角）的、与语言指令对应的方向性判断？
  - 还是指导机器人走向图中某个特定物体？

换句话说，在设计预测这个点位时，数据采集选择的概念意义是什么？
此外：

2.  在筛选 3D 映射点的时候是如何选择合适点的？
3. 最终筛选出的 3D 点如何确保它的 2D 映射点符合文本指令的对齐需求？

**2. 关于不可见投影点的处理**

在进行 3D 到 2D 投影时，是否会遇到当前帧对应的所有预测投影点都不可见的情况？

  - 这样的情况下是如何处理的呢？是直接丢弃该帧吗？
  - 若该帧是处于路径转折点等关键导航帧，又是如何处理的？

**3. 各子数据集的大小**

官方已经给出了总数据集大小，但能否提供每个子数据集的大概大小（不是 mini 版本，而是完整数据集）？



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Meaning of 2D Point Prediction in VLN-CE Subset #281

Question

English Version

中文版本

问题

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: Meaning of 2D Point Prediction in VLN-CE Subset #281

Description

Question

English Version

中文版本

问题

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions