CN105339981B

CN105339981B - Method for using one group of primitive registration data

Info

Publication number: CN105339981B
Application number: CN201480034631.3A
Authority: CN
Inventors: 田口裕; 田口裕一; E·阿塔埃尔-坎斯佐古力; S·拉姆阿里加姆; T·W·加拉斯
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2013-06-19
Filing date: 2014-05-30
Publication date: 2019-04-12
Anticipated expiration: 2034-05-30
Also published as: JP6228239B2; CN105339981A; JP2016527574A; WO2014203743A1; DE112014002943T5

Abstract

A method of carrying out registration data using one group of primitive for including point and plane.Firstly, this method selects first group of primitive from the data in the first coordinate system, wherein first group of primitive includes at least three primitives and at least one plane.It predicts from the first coordinate system to the transformation of the second coordinate system.Using the transformation by first group of basis element change to the second coordinate system.Second group of primitive is determined according to first group of primitive for transforming to the second coordinate system.Then, using first group of primitive in the first coordinate system and second group of primitive in the second coordinate system, by the second coordinate system and the first co-registration of coordinate systems used.Registration can be used for tracking the posture for obtaining the camera of data.

Description

Methods for registering data using a set of primitives

技术领域technical field

本发明总体涉及计算机视觉，并且更特别地，涉及估计相机的姿势。The present invention relates generally to computer vision and, more particularly, to estimating the pose of a camera.

背景技术Background technique

在重建场景的3D结构的同时跟踪相机的姿势的系统和方法被广泛用于增强现实(AR)可视化、机器人导航、场景建模、以及计算机视觉应用。这样的处理通常被称为同时定位与地图构建(SLAM)。实时SLAM系统可以使用获取二维(2D)图像的传统相机、获取三维(3D)点云(一组3D点)的深度相机、或者获取2D图像和3D点云两者的红、绿、蓝和深度(RGB-D)相机(诸如，)。跟踪是指使用相机的预测运动以用于顺序地估计相机的姿势的处理，而重定位是指使用一些基于特征的全局配准(registration)以用于从跟踪失败恢复的处理。Systems and methods for tracking the pose of a camera while reconstructing the 3D structure of a scene are widely used in augmented reality (AR) visualization, robotic navigation, scene modeling, and computer vision applications. Such processing is often referred to as simultaneous localization and mapping (SLAM). A real-time SLAM system can use conventional cameras that acquire two-dimensional (2D) images, depth cameras that acquire three-dimensional (3D) point clouds (a set of 3D points), or red, green, blue, and Depth (RGB-D) cameras (such as, ). Tracking refers to the process of using the predicted motion of the camera for sequentially estimating the pose of the camera, while relocalization refers to the process of using some feature-based global registration for recovery from tracking failures.

使用2D相机的SLAM系统针对纹理场景通常是成功的，但是针对无纹理区域很可能失败。在迭代最近点(ICP)方法的帮助下，使用深度相机的系统依靠场景中的几何变化(诸如，曲面和深度边界)。然而，当几何变化很小时(诸如，在平面场景中)，基于ICP的系统经常失败。使用RGB-D相机的系统可以利用纹理和几何特征两者，但是它们仍然要求独特的纹理。SLAM systems using 2D cameras are usually successful for textured scenes, but are likely to fail for textureless regions. With the help of iterative closest point (ICP) methods, systems using depth cameras rely on geometrical changes in the scene (such as surfaces and depth boundaries). However, ICP-based systems often fail when geometry changes are small (such as in planar scenes). Systems using RGB-D cameras can utilize both texture and geometric features, but they still require unique textures.

很多方法都没有明确解决重建比单人房间大的三维模型时的困难。为了将那些方法扩展到较大的场景，要求较好的存储器管理技术。然而，存储器限制不是唯一的挑战。通常，房间规模的场景包括具有纹理和几何特征的许多对象。为了扩展到较大的场景，需要跟踪具有有限纹理和不充分几何变化的区域(诸如，走廊)中的相机姿势。Many approaches do not explicitly address the difficulty of reconstructing 3D models larger than a single-occupancy room. In order to extend those methods to larger scenarios, better memory management techniques are required. However, memory constraints are not the only challenge. Often, room-scale scenes include many objects with textured and geometric features. To scale to larger scenes, camera poses need to be tracked in areas with limited texture and insufficient geometric variation, such as corridors.

相机跟踪camera tracking

考虑到一些3D对应，使用3D传感器获取3D点云的系统将跟踪问题归纳为配准问题。ICP方法从由相机运动预测给出的初始姿势估计开始，迭代地定位点到点或点到平面对应。ICP已被广泛用于移动机器人中的线扫描3D传感器(也称为扫描匹配)以及用于产生全部3D点云的深度相机和3D传感器。U.S.20120194516以ICP方法将点到平面对应用于相机的姿势跟踪。地图的该表示是一组体元(voxel)。每个体元表示针对到最接近表面点的距离的截断符号距离函数(truncated signed distance function)。该方法不从3D点云提取平面；相反，通过使用局部邻域确定3D点的法线来建立点到平面对应。这样的基于ICP的方法要求场景具有充分几何变化以用于精确配准。Given some 3D correspondence, systems that use 3D sensors to acquire 3D point clouds generalize the tracking problem into a registration problem. The ICP method iteratively locates point-to-point or point-to-plane correspondences, starting from an initial pose estimate given by camera motion prediction. ICP has been widely used for line scan 3D sensors (also known as scan matching) in mobile robots as well as for depth cameras and 3D sensors for generating full 3D point clouds. US20120194516 applies point-to-plane correspondence to the ICP method Camera pose tracking. This representation of the map is a set of voxels. Each voxel represents a truncated signed distance function for the distance to the closest surface point. This method does not extract planes from 3D point clouds; instead, point-to-plane correspondences are established by using local neighborhoods to determine the normals of 3D points. Such ICP-based methods require scenes with sufficient geometric variation for accurate registration.

另一种方法从RGB图像提取特征并且执行基于描述符的点匹配，以确定点到点对应并且估计相机姿势，然后利用ICP方法将相机姿势细化。该方法使用场景中的纹理(RGB)和几何(深度)特征。然而，仅使用点特征来处理无纹理区域和具有重复纹理的区域仍然有问题。Another approach extracts features from RGB images and performs descriptor-based point matching to determine point-to-point correspondences and estimate camera poses, which are then refined using ICP methods. The method uses texture (RGB) and geometric (depth) features in the scene. However, using only point features for textureless regions and regions with repetitive textures is still problematic.

使用平面的SLAMUsing flat SLAM

平面特征已被用于许多SLAM系统中。为了确定相机姿势，要求其法线跨越R³的至少三个平面。因此，特别是当视场(FOV)或传感器的范围较小(诸如，在中)时，仅使用平面导致许多退化问题(degeneracy issue)。大FOV线扫描3D传感器和小视场(FOV)深度相机的组合可以利用额外系统成本避免退化。Planar features have been used in many SLAM systems. To determine the camera pose, its normals are required to span at least three planes of ^R3 . Therefore, especially when the field of view (FOV) or the range of the sensor is small (such as in ), using only planes leads to many degeneracy issues. The combination of a large FOV line scan 3D sensor and a small field of view (FOV) depth camera can avoid degradation with additional system cost.

相关申请中描述的方法使用了点-平面SLAM，点-平面SLAM使用点和平面两者以避免在使用这些基元中的一个基元的方法中常见的失败模式。该系统不使用任何相机运动预测。相反，该系统通过全局地定位点和平面对应，针对所有帧执行重定位。结果，该系统仅能每秒处理约三个帧，并且遇到由于基于描述符的点匹配导致的一些重复纹理的失败。The method described in the related application uses point-plane SLAM, which uses both points and planes to avoid failure modes common in methods using one of these primitives. The system does not use any camera motion prediction. Instead, the system performs relocation for all frames by locating point and plane correspondences globally. As a result, the system was only able to process about three frames per second and encountered failures with some duplicate textures due to descriptor-based point matching.

相关申请中描述的方法还提出了使用点到点和平面到平面对应两者在不同坐标系中配准3D数据。The methods described in the related application also propose the use of both point-to-point and plane-to-plane correspondences to register 3D data in different coordinate systems.

发明内容SUMMARY OF THE INVENTION

在包括人造结构的室内和室外场景中，平面是主要的。本发明的实施方式提供了一种用于跟踪使用点和平面作为基元特征的RGB-D相机的系统和方法。通过拟合平面，该方法隐式地处理3D传感器特有的深度数据中的噪声。跟踪方法由重定位和集束调整(bundleadjustment)处理支持，以展示使用手持式或机器人上安装的RGB-D相机的实时同时定位与地图构建(SLAM)系统。In indoor and outdoor scenes including man-made structures, planes are predominant. Embodiments of the present invention provide a system and method for tracking an RGB-D camera using points and planes as primitive features. By fitting a plane, the method implicitly deals with noise in depth data specific to 3D sensors. The tracking method is supported by relocalization and bundle adjustment processing to demonstrate a real-time simultaneous localization and mapping (SLAM) system using a handheld or robot-mounted RGB-D camera.

本发明的一个目的是在将导致配准失败的退化问题减到最小的同时使能快速和精确的配准。该方法使用相机运动预测定位点和平面对应，并且提供基于预测和校正框架的跟踪器。该方法结合使用点和平面两者的重定位和集束调整处理，以从跟踪失败恢复并且持续细化相机姿势估计。It is an object of the present invention to enable fast and accurate registration while minimizing degradation problems that lead to registration failures. The method uses camera motion to predict anchor and plane correspondences, and provides a tracker based on a prediction and correction framework. The method uses a combination of point and plane relocalization and bundle adjustment processing to recover from tracking failures and continue to refine the camera pose estimate.

具体地，一种方法使用包括点和平面的一组基元来配准数据。首先，该方法从第一坐标系中的数据选择第一组基元，其中，第一组基元包括至少三个基元和至少一个平面。Specifically, one method uses a set of primitives including points and planes to register data. First, the method selects a first set of primitives from data in a first coordinate system, wherein the first set of primitives includes at least three primitives and at least one plane.

对从第一坐标系到第二坐标系的变换进行预测。使用该变换将第一组基元变换到第二坐标系。根据变换到第二坐标系的第一组基元确定第二组基元。The transformation from the first coordinate system to the second coordinate system is predicted. Use this transformation to transform the first set of primitives to the second coordinate system. The second set of primitives is determined from the first set of primitives transformed to the second coordinate system.

然后，使用第一坐标系中的第一组基元和第二坐标系中的第二组基元，将第二坐标系与第一坐标系配准。所述配准可以用于跟踪获取数据的相机的姿势。Then, the second coordinate system is registered with the first coordinate system using the first set of primitives in the first coordinate system and the second set of primitives in the second coordinate system. The registration can be used to track the pose of the camera that acquired the data.

附图说明Description of drawings

图1是根据本发明的实施方式的用于跟踪相机的姿势的方法的流程图；以及1 is a flowchart of a method for tracking a pose of a camera according to an embodiment of the present invention; and

图2是根据本发明的实施方式的用于使用相机的预测姿势在当前帧与地图(map)之间建立点到点和平面到平面对应的过程的示意图。2 is a schematic diagram of a process for establishing point-to-point and plane-to-plane correspondences between a current frame and a map using a camera's predicted pose, according to an embodiment of the present invention.

具体实施方式Detailed ways

本发明的实施方式提供了一种用于跟踪相机的姿势的系统和方法。本发明通过将相机运动预测用于更快速的对应搜索和配准，扩展了在我们的相关美国申请Sn.13/539,060中描述的实施方式。我们使用在当前帧和地图之间建立的点到点和平面到平面对应。地图包括来自之前在全局坐标系中配准的帧的点和平面。这里，我们关注的是使用相机运动预测来建立平面到平面对应以及建立点到点和平面到平面对应两者的混合情况。Embodiments of the present invention provide a system and method for tracking the pose of a camera. The present invention extends the embodiment described in our related US application Sn. 13/539,060 by using camera motion prediction for faster correspondence search and registration. We use the point-to-point and plane-to-plane correspondence established between the current frame and the map. The map includes points and planes from frames previously registered in the global coordinate system. Here, we focus on using camera motion prediction to establish plane-to-plane correspondence and a mixture of both point-to-point and plane-to-plane correspondences.

系统概述System Overview

在优选系统中，RGB-D相机102是或Xtion PRO LIVE，其获取一系列帧101。我们使用基于关键帧的SLAM系统，这里我们选择多个代表帧作为关键帧，并且将在单个全局坐标系中配准的关键帧存储在地图中。与仅使用点的现有技术SLAM相比，我们在系统中的所有处理中使用点和平面两者作为基元。每个帧中的点和平面被称为量度(measurement)，并且来自关键帧的量度作为地标被存储在地图中。In a preferred system, the RGB-D camera 102 is or Xtion PRO LIVE, which acquires a series of frames 101 . We use a keyframe-based SLAM system, where we select multiple representative frames as keyframes and store the keyframes registered in a single global coordinate system in the map. Compared to the state-of-the-art SLAM that only uses points, we use both points and planes as primitives in all processing in the system. Points and planes in each frame are called measurements, and measurements from keyframes are stored in the map as landmarks.

在给定地图的情况下，我们使用预测和校正框架来估计当前帧的姿势：我们预测相机的姿势，并且使用该姿势来确定点和平面量度与点和平面地标之间的对应，然后使用所述对应来确定相机姿势。Given a map, we use a prediction and correction framework to estimate the pose of the current frame: we predict the pose of the camera, and use this pose to determine the correspondence between point and plane metrics and point and plane landmarks, then use the to determine the camera pose.

跟踪可能由于不正确或不充分的对应而失败。在预定数量的连续跟踪失败之后，我们重定位，其中，我们在当前帧与地图之间使用全局点和平面对应搜索。我们还应用使用点和平面的集束调整，以异步地细化在地图中的地标。Tracking may fail due to incorrect or insufficient correspondence. After a predetermined number of consecutive track failures, we relocalize, where we use a global point and plane correspondence search between the current frame and the map. We also apply bundle adjustment using points and planes to asynchronously refine landmarks in the map.

方法概述Method overview

如图1所述，由场景103的红、绿、蓝和深度(RGB-D)相机102获取110当前帧101。预测获取所述帧时该相机的姿势，相机的姿势被用于定位130帧与地图194之间的点和平面对应。在随机采样一致性(RANSAC)框架140中使用点和平面对应，以将帧配准到地图。如果150配准失败，则对连续失败的数量进行计数，并且如果154为假(F)，则继续下一帧，否则如果154为真(T)，则使用全局配准方法而不使用相机运动预测来重定位158相机。The current frame 101 is acquired 110 by the red, green, blue and depth (RGB-D) camera 102 of the scene 103 as described in FIG. 1 . The pose of the camera when the frame was acquired is predicted, and the pose of the camera is used to locate 130 the point and plane correspondence between the frame and the map 194 . Point and plane correspondences are used in a random sampling agreement (RANSAC) framework 140 to register frames to a map. If 150 registration failed, count the number of consecutive failures and if 154 is false (F) continue to the next frame, else if 154 is true (T) use global registration method without camera motion Prediction to relocate 158 cameras.

如果RANSAC配准成功，则在RANSAC框架中估计的姿势160被用作帧的姿势。接下来，确定170当前帧是否为关键帧，并且如果为假，则在步骤110处，继续进行下一帧。否则，在当前帧中提取180附加的点和平面，更新190地图194，并且针对下一帧继续进行。使用集束调整对地图进行异步细化198。If the RANSAC registration is successful, the pose 160 estimated in the RANSAC framework is used as the pose of the frame. Next, it is determined 170 whether the current frame is a key frame, and if false, at step 110, proceed to the next frame. Otherwise, the additional points and planes are extracted 180 in the current frame, the map 194 is updated 190, and the process continues for the next frame. Asynchronous refinement of the map using bundle adjustment 198.

可以在连接到本领域中已知的存储器和输入/输出接口的处理器中执行这些步骤。These steps may be performed in a processor connected to memory and input/output interfaces known in the art.

相机姿势跟踪camera pose tracking

如上所述，我们的跟踪使用包括点和平面两者的特征。所述跟踪基于预测和校正方案，这可以被概括如下。针对每一个帧，我们使用相机运动模型来预测姿势。基于所预测的姿势，我们定位帧中与地图中的点和平面地标相对应的点和平面量度。我们使用点和平面对应来执行基于RANSAC的配准。如果该姿势与当前存储在地图中的任何关键帧的姿势不同，则我们提取附加的点和平面量度，并且将该帧添加到地图中作为新关键帧。As mentioned above, our tracking uses features that include both points and planes. The tracking is based on a prediction and correction scheme, which can be summarized as follows. For each frame, we use the camera motion model to predict the pose. Based on the predicted pose, we locate point and plane metrics in the frame that correspond to point and plane landmarks in the map. We use point and plane correspondences to perform RANSAC-based registration. If the pose is different from the pose of any keyframe currently stored in the map, we extract the additional point and plane metrics, and add the frame to the map as a new keyframe.

相机运动预测camera motion prediction

我们将第k个帧的姿势表示为We denote the pose of the kth frame as

其中,R_k和t_k分别表示旋转矩阵和平移矢量。我们使用第一帧来定义地图的坐标系；因此，T₁是单位矩阵，并且T_k表示第k个帧相对于地图的姿势。Among them, R _k and t _k represent the rotation matrix and translation vector, respectively. We use the _first frame to define the map's coordinate system; thus, T1 is the identity matrix, and Tk represents the pose of the _kth frame relative to the map.

我们通过使用恒定速度假设来预测第k个帧的姿势使ΔT表示第(k-1)个帧与第(k-2)个帧之间的先前估计的运动，即，ΔT＝T_k-1T_k-2 ^-1。然后，我们将第k个帧的姿势预测为 We predict the pose of the kth frame by using the constant velocity assumption Let ΔT denote the previously estimated motion between the (k-1)th frame and the (k-2)th frame, ie, ΔT=T _k-1 T _k-2 ^-1 . Then, we predict the pose of the kth frame as

定位点和平面对应Anchor point and plane correspondence

如图2所示，我们使用所预测的姿势来定位与地图中的地标对应的第k个帧中的点和平面量度。考虑当前帧的预测姿势201，我们定位地图202中的点和平面地标与当前帧203中的点和平面量度之间的对应。我们首先使用所预测的姿势将地图中的地标转变到当前帧。然后，针对每一个点，我们从当前帧中的所预测的像素位置开始使用光流过程执行局部搜索。针对每一个平面，我们首先定位所预测的平面的参数。然后，我们考虑所预测的平面上的一组基准点，并且定位与位于所预测的平面上的每个基准点连接的像素。选择具有最大数量的连接的像素的基准点，并且使用所有连接的像素来细化平面参数。As shown in Figure 2, we use the predicted pose to locate the point and plane measure in the kth frame corresponding to the landmark in the map. Considering the predicted pose 201 of the current frame, we locate the correspondence between the point and plane landmarks in the map 202 and the point and plane metrics in the current frame 203 . We first transform landmarks in the map to the current frame using the predicted pose. Then, for each point, we perform a local search using an optical flow process starting from the predicted pixel location in the current frame. For each plane, we first locate the parameters of the predicted plane. Then, we consider a set of fiducials on the predicted plane and locate the pixel connected to each fiducial lying on the predicted plane. The fiducial point with the largest number of connected pixels is chosen, and all connected pixels are used to refine the plane parameters.

点对应：使p_i＝(x_i,y_i,z_i,1)^T表示地图中的第i个点地标210，第i个点地标210被表示为齐次矢量。当前帧中的p_i的2D图像投影220被预测为Point correspondence: Let p _i =( _xi ,y _i ,z _i ,1) ^T denote the ith point landmark 210 in the map, which is represented as a homogeneous vector. The 2D image projection 220 of _pi in the current frame is predicted as

其中，是变换到第k个帧的坐标系的3D点，并且函数FP(·)使用内部相机校准参数来确定3D点到图像平面上的前向投影。我们通过使用Lucas-Kanade的光流方法从的初始位置开始定位相应点量度。使为所确定的光流矢量230。然后，相应点量度为in, is the 3D point transformed to the coordinate system of the kth frame, and the function FP(·) uses the internal camera calibration parameters to determine the forward projection of the 3D point onto the image plane. We use Lucas-Kanade's optical flow method from The initial position of , starts to locate the corresponding point measurement. Make is the determined optical flow vector 230 . Then, the corresponding point measure for

其中，函数BP(·)将2D图像像素后向投影到3D射线，并且D(·)是指像素的深度值。如果光流矢量未被确定或像素位置具有无效深度值，则认为该特征丢失。where the function BP(·) backprojects 2D image pixels to 3D rays, and D(·) refers to the depth value of the pixel. If the optical flow vector is not determined or the pixel position With an invalid depth value, the feature is considered missing.

平面对应：代替对每个帧独立于其它帧地执行耗时的平面提取过程(现有技术)，我们利用所预测的姿势来提取平面。这产生更快的平面量度提取，并且还提供平面对应。Plane correspondence: Instead of performing a time-consuming plane extraction process (prior art) for each frame independently of other frames, we utilize the predicted pose to extract the plane. This results in faster plane metric extraction and also provides plane correspondence.

使π_j＝(a_j,b_j,c_j,d_j)^T表示地图中的第j个平面地标240的平面方程。我们假设平面地标和相应量度在图像中具有一些重叠区域。为了定位这样的相应平面量度，我们从第j个平面地标的内点(inlier)随机地选择多个基准点250q_j,r(r＝1,...,N)，并且将基准点转变到第k个帧作为255Let π _j =(a _j ,b _j ,c _j ,d _j ) ^T denote the plane equation for the jth plane landmark 240 in the map. We assume that planar landmarks and corresponding metrics have some overlapping areas in the image. To locate such corresponding plane metrics, we randomly select a number of fiducial points 250q _j,r (r=1,...,N) from the inliers of the jth plane landmark, and transform the fiducial points to The kth frame as 255

我们还将πj转变到第k个帧作为245We also transform πj to the kth frame as 245

我们从平面上的每个转变后的基准点定位连接的像素260，并且选择具有最大内点的像素。这些内点用于细化平面方程，得到相应的平面量度如果内点的数量小于阈值，则平面地标被声明为丢失。例如，我们使用N＝5个基准点，使用针对点到平面距离的为50mm的阈值来确定平面上的内点，并且使用9000作为最小数量的内点的阈值。we start from the plane datum point after each transition on Connected pixels 260 are located, and the pixel with the largest inlier is selected. These interior points are used to refine the plane equation, resulting in the corresponding plane measure If the number of inliers is less than a threshold, the planar landmark is declared missing. For example, we use N=5 fiducial points, use a threshold of 50mm for point-to-plane distance to determine inliers on the plane, and use 9000 as the threshold for the minimum number of inliers.

地标选择Landmark selection

使用地图中的所有地标执行上述处理可能效率低。因此，我们使用在最接近当前帧的单个关键帧中出现的地标。在跟踪处理之前，通过使用先前帧的姿势T_k-1来选择最接近的关键帧。It may be inefficient to perform the above processing with all landmarks in the map. Therefore, we use the landmark that occurs in the single keyframe closest to the current frame. Before the tracking process, the closest keyframe is selected by using the pose Tk _-1 of the previous frame.

RANSAC配准RANSAC registration

基于预测的对应搜索提供点到点和平面到平面对应的候选，所述候选可能包括外点(outlier)。因此，我们执行基于RANSAC的配准以确定内点并且确定相机姿势。为了明确地确定姿势，我们需要至少三个对应。因此，如果存在少于三个对应的候选，则我们立即确定跟踪失败。为了进行精确相机跟踪，当仅存在少量对应的候选时，我们也确定跟踪失败。The prediction-based correspondence search provides candidates for point-to-point and plane-to-plane correspondences, which may include outliers. Therefore, we perform RANSAC-based registration to determine the inliers and determine the camera pose. To unambiguously determine the pose, we need at least three correspondences. Therefore, if there are less than three corresponding candidates, we immediately determine that the tracking has failed. For accurate camera tracking, we also determine that tracking fails when there are only a few corresponding candidates.

如果存在足够数量的候选，则我们使用封闭形式的混合对应来解决配准问题。因为平面的数量通常远远少于点的数量，并且由于来自许多点的支持致使平面具有较少噪声，该过程使平面对应优先于点对应。如果RANSAC定位了足够数量的内点(例如，所有点和平面量度的数量的40％)，则认为跟踪成功。该方法产生第k个帧的校正姿势T_k。If there is a sufficient number of candidates, we use closed-form mixed correspondence to solve the registration problem. Because the number of planes is usually much less than the number of points, and the planes are less noisy due to support from many points, this process prioritizes plane correspondences over point correspondences. Tracking is considered successful if RANSAC locates a sufficient number of interior points (eg, 40% of the number of all point and plane metrics). This method produces the corrected pose Tk for the _kth frame.

地图更新map update

如果所估计的姿势T_k与地图中的任何现有关键帧的姿势充分不同，则我们将第k个帧确定为关键帧。为了检验该情况，我们可以例如使用平移100mm和旋转5°的阈值。针对新关键帧，使在基于RANSAC的配准中被定位为内点的点和平面量度与相应地标相关联，同时丢弃被定位为外点的那些点和平面量度。然后，我们可以提取最新出现在该帧中的附加点和平面量度。附加点量度是在不接近任何现有点量度的像素上，使用关键点检测器(诸如，尺度不变特征变换(SIFT)和加速鲁棒特征(SURF))提取的。附加平面量度是通过在不是任何现有平面量度的内点的像素上使用基于RANSAC的平面拟合来提取的。将附加点和平面量度作为新地标添加到地图。另外，我们针对帧中的所有点量度提取用于重定位的特征描述符(诸如，SIFT和SURF)。If the estimated pose Tk is sufficiently different from the pose of any existing keyframes in the map, we determine the _kth frame as a keyframe. To check this, we can eg use a threshold of 100mm translation and 5° rotation. For the new keyframe, point and plane metrics located as inliers in the RANSAC-based registration are associated with corresponding landmarks, while those located as outliers are discarded. We can then extract the additional point and plane metrics that were most recent in that frame. Additional point metrics are extracted using keypoint detectors such as Scale-Invariant Feature Transform (SIFT) and Accelerated Robust Features (SURF) on pixels that are not close to any existing point metrics. Additional plane metrics are extracted by using a RANSAC-based plane fit on pixels that are not inliers of any existing plane metrics. Add additional points and plane measures to the map as new placemarks. Additionally, we extract feature descriptors (such as SIFT and SURF) for relocalization for all point metrics in the frame.

Claims

1. A method for registering data using a set of primitives, wherein the data has three dimensions (3D), and both points and planes are used as the primitives, the method comprising the steps of:

selecting a first set of primitives from the data in a first coordinate system, wherein the first set of primitives includes at least three primitives, at least one of which is a plane;

predicting a transformation from the first coordinate system to the second coordinate system, wherein the transformation is predicted using a camera motion model;

transforming the first set of primitives to the second coordinate system using the predicted transformation;

determining a second set of primitives from the first set of primitives transformed to the second coordinate system; and

Using the first set of primitives in the first coordinate system and the second set of primitives in the second coordinate system that correspond to each other, the second coordinate system is aligned with the first coordinate system Registration, wherein the registration is used for simultaneous localization and mapping (SLAM), and the above steps are performed in a processor.

2. The method according to claim 1, wherein, among the at least three primitives included in the first group of primitives, at least one primitive is at least one point in the first coordinate system, and at least one primitive is is at least one plane in the first coordinate system, and among the at least three primitives included in the second group of primitives, at least one primitive is at least one point in the second coordinate system, and at least one A primitive is at least one plane in the second coordinate system.

3. The method of claim 1, wherein the data is acquired by a movable camera.

4. The method of claim 1, wherein the data includes texture and depth.

5. The method of claim 1, wherein the registration uses random sampling consensus (RANSAC).

6. The method of claim 1, wherein the data is in the form of a series of frames acquired by a camera.

7. The method of claim 6, further comprising:

selecting a set of frames from the series of frames as keyframes; and

The keyframes are stored in a map, wherein the keyframes include the points and the planes, and the points and the planes are stored in the map as landmarks.

8. The method of claim 7, further comprising:

predict the pose of the camera for each frame; and

From the registration, the pose of the camera is determined for each frame to track the camera.

9. The method of claim 1, wherein the registration is in real time.

10. The method of claim 7, further comprising:

The application refines the landmarks in the map using the bundle adjustment of the points and the plane.

11. The method of claim 8, wherein the pose of the kth frame is

Among them, R _k and t _k represent the rotation matrix and translation vector, respectively.

12. The method of claim 8, wherein the prediction uses a constant speed assumption.

13. The method of claim 6, wherein the point in the frame is located using an optical flow process.

14. The method of claim 1, wherein the correspondence of the plane is prioritized over the correspondence of the points.