CN117456003A - Category-level object 6D pose estimation method and system based on dynamic key point detection - Google Patents
Category-level object 6D pose estimation method and system based on dynamic key point detection Download PDFInfo
- Publication number
- CN117456003A CN117456003A CN202311546440.2A CN202311546440A CN117456003A CN 117456003 A CN117456003 A CN 117456003A CN 202311546440 A CN202311546440 A CN 202311546440A CN 117456003 A CN117456003 A CN 117456003A
- Authority
- CN
- China
- Prior art keywords
- features
- point
- key
- scene
- key points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/77—Determining position or orientation of objects or cameras using statistical methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域Technical field
本发明涉及计算机视觉技术领域,具体为基于动态关键点检测的类别级物体6D位姿估计方法及系统。The present invention relates to the field of computer vision technology, specifically a method and system for 6D pose estimation of category-level objects based on dynamic key point detection.
背景技术Background technique
物体6D位姿估计技术是一项重要的计算机视觉和机器人领域的技术,它用于精确地确定三维物体在六个自由度上的位姿,即三维平移和旋转。这一技术在许多应用中具有广泛的用途,例如自动化制造、机器人操作、增强现实、虚拟现实、无人驾驶汽车等领域。Object 6D pose estimation technology is an important technology in the fields of computer vision and robotics. It is used to accurately determine the pose of a three-dimensional object in six degrees of freedom, that is, three-dimensional translation and rotation. This technology has broad uses in many applications, such as automated manufacturing, robotic operations, augmented reality, virtual reality, driverless cars, and more.
尽管目前基于固定关键点检测的实例级6D物体位姿估计方法已经具有很好的性能与鲁棒性,但是其仅仅针对单一实例有效的特性使得该类方法距离现实落地仍有很大的差距。因此,基于归一化类别坐标空间的类别级6D物体位姿估计方法被提出,此类方法能够检测同一类别物体下不同实例的物体位姿,具有较高的泛用性,更接近实际生产的需要。Although the current instance-level 6D object pose estimation method based on fixed key point detection has good performance and robustness, its only effective feature for a single instance makes this type of method still far from being implemented in reality. Therefore, a category-level 6D object pose estimation method based on the normalized category coordinate space is proposed. This type of method can detect the object poses of different instances of the same category of objects, has high versatility, and is closer to actual production. need.
目前的类别级6D物体位姿估计方法大致可以被分为两类。一是基于提取的特征直接回归位姿参数方法。然而,由于三维空间旋转矩阵群的复杂与非凸性使得该类方法较难优化,往往达不到实际应用场景所需的精度与鲁棒性。二是基于密集物体坐标系坐标预测的方法,旨在检测场景中的每个点在归一化物体坐标系下的位置,并通过这些对应关系,使用PnP或位姿预测网络进行后处理输出物体姿态。这些方法将物体位姿的回归转化为场景中每个点在物体坐标系下的位置的预测,使得网络更容易拟合和优化。上述方法虽然避免了三维旋转群的难以优化的问题,但是三维点云往往存在许多噪声,直接预测场景中所有点在物体坐标系下的坐标会因为噪声点的存在而影响网络的性能。不仅如此,场景点云的规模往往是巨大的,采用对所有点进行预测的方式会使得计算量开销过大,对设备的存储要求较大,这些都不利于算法的现实落地。因此,提出了一种基于动态关键点检测的类别级物体6D位姿估计方法,用以解决上述所说现有方法存在的问题。Current category-level 6D object pose estimation methods can be roughly divided into two categories. The first is the direct regression pose parameter method based on the extracted features. However, due to the complexity and non-convexity of the three-dimensional space rotation matrix group, this type of method is difficult to optimize and often fails to achieve the accuracy and robustness required in actual application scenarios. The second is a method based on dense object coordinate system coordinate prediction, which aims to detect the position of each point in the scene under the normalized object coordinate system, and through these correspondences, use PnP or pose prediction network for post-processing and output object attitude. These methods convert the regression of object pose into a prediction of the position of each point in the scene in the object coordinate system, making the network easier to fit and optimize. Although the above method avoids the problem of difficult optimization of 3D rotation groups, 3D point clouds often contain a lot of noise. Directly predicting the coordinates of all points in the scene in the object coordinate system will affect the performance of the network due to the presence of noise points. Not only that, the scale of the scene point cloud is often huge, and the method of predicting all points will cause excessive computational overhead and high storage requirements for the device, which are not conducive to the practical implementation of the algorithm. Therefore, a category-level object 6D pose estimation method based on dynamic key point detection is proposed to solve the problems of the existing methods mentioned above.
发明内容Contents of the invention
(一)解决的技术问题(1) Technical problems solved
针对现有技术的不足,本发明提供了基于动态关键点检测的类别级物体6D位姿估计方法及系统,可以自适应的从观测场景中提取物体的关键点,即使在场景存在较多噪声点,遮挡情况严重时也能达到较好的效果。In view of the shortcomings of the existing technology, the present invention provides a category-level object 6D pose estimation method and system based on dynamic key point detection, which can adaptively extract key points of objects from the observation scene, even when there are many noise points in the scene. , it can achieve better results even when the occlusion situation is serious.
(二)技术方案(2) Technical solutions
为实现以上目的,本发明通过以下技术方案予以实现:In order to achieve the above objectives, the present invention is achieved through the following technical solutions:
第一方面,提供了一种基于动态关键点检测的类别级物体6D位姿估计方法,包括:The first aspect provides a category-level object 6D pose estimation method based on dynamic key point detection, including:
接收物体的图像数据,所述图像数据包括RDB图像和点云,所述点云通过结合相机内部参数将深度图中的像素经过随机采样后投影到场景中形成;Receive image data of the object. The image data includes RDB images and point clouds. The point clouds are formed by randomly sampling the pixels in the depth map and projecting them into the scene in combination with internal parameters of the camera;
分别提取RDB图像的图像特征和点云的点云特征,并将图像特征和点云特征进行拼接融合得到融合后的特征;Extract the image features of the RDB image and the point cloud features of the point cloud respectively, and splice and fuse the image features and point cloud features to obtain the fused features;
将融合后的特征输入到预设的动态关键点检测网络提取物体的关键点;Input the fused features into the preset dynamic key point detection network to extract the key points of the object;
将所述关键点输入到预设的多尺度位姿预测网络中,将局部结构信息聚合到关键点中得到具有多尺度信息的关键点,通过具有多尺度信息的关键点特征预测关键点在物体空间坐标系下的位置,并将输出的关键点在场景中的位置、关键点在场景中的特征以及关键点在物体空间坐标系下的位置和特征拼接后形成多组对应关系,通过多层感知机输出最终的物体6D位姿。Input the key points into the preset multi-scale pose prediction network, aggregate the local structure information into key points to obtain key points with multi-scale information, and predict the key points in the object through the key point features with multi-scale information. The position in the spatial coordinate system, and the output key point's position in the scene, the key point's characteristics in the scene, and the key point's position and characteristics in the object space coordinate system are spliced to form multiple sets of correspondences, through multiple layers The perceptron outputs the final 6D pose of the object.
优选的,所述RDB图像的特征提取器采用Resnet18卷积神经网络。Preferably, the feature extractor of the RDB image adopts Resnet18 convolutional neural network.
优选的,所述分别提取RDB图像的图像特征和点云的点云特征,具体包括:Preferably, separately extracting the image features of the RDB image and the point cloud features of the point cloud specifically includes:
将输入的RGB图片送入Resnet18卷积神经网络中,提取图像的特征图frgb∈Rh×w×c;Send the input RGB image to the Resnet18 convolutional neural network, and extract the feature map f rgb ∈R h×w×c of the image;
将点云输入到Pointnet++点云特征提取网络中,提取点云的结构特征fpoint∈RN ×C;其中,N为点云的数量。Input the point cloud into the Pointnet++ point cloud feature extraction network to extract the structural features f point ∈R N ×C of the point cloud; where N is the number of point clouds.
优选的,所述将图像特征和点云特征进行拼接融合得到融合后的特征,具体包括Preferably, the image features and point cloud features are spliced and fused to obtain the fused features, specifically including:
通过相机内部参数将点云的结构特征投射到图像的特征图上,通过双线性插值提取点云的结构特征在图像的特征图上的对应特征Fpoint→rgb∈RN×C;Project the structural features of the point cloud onto the feature map of the image through the internal parameters of the camera, and extract the corresponding features F point→rgb ∈R N×C of the structural features of the point cloud on the feature map of the image through bilinear interpolation;
通过将图像的特征图和点云的结构特征拼接后经过一个多层MLP输出得到融合后的特征ffusion∈RN×C。By splicing the feature map of the image and the structural features of the point cloud and outputting it through a multi-layer MLP, the fused feature f fusion ∈R N×C is obtained.
优选的,所述将融合后的特征输入到预设的动态关键点检测网络提取物体的关键点,具体包括:Preferably, the input of the fused features into a preset dynamic key point detection network to extract the key points of the object specifically includes:
引入注意力机制和Transformer Layer进行动态检测物体关键点, 表示Ns个随机初始化并会随着训练过程不断更新的KPT query,用于代表场景中的Ns个关键点;Introduce attention mechanism and Transformer Layer to dynamically detect key points of objects. Represents N s KPT queries that are randomly initialized and continuously updated with the training process, and are used to represent N s key points in the scene;
将代表不同关键点的query与从场景提取的融合后的特征ffusion∈RN×C通过crossattention层进行交互,并对KPT query进行场景自适应的更新:The query representing different key points interacts with the fused features f fusion ∈R N×C extracted from the scene through the crossattention layer, and the KPT query is updated scene adaptively:
fkpt=MHCA(ffusion;fkpt),f kpt =MHCA (f fusion ; f kpt ),
利用基于相似性的热度图生成策略,利用每个KPT query与场景点计算相似性后,通过热度图加权的方式生成关键点的3D位置和3D特征:Using a similarity-based heat map generation strategy, after calculating the similarity between each KPT query and scene points, the 3D positions and 3D features of key points are generated through heat map weighting:
heatmap=Softmax(Similarity(f′kpt,ffusion))heatmap=Softmax(Similarity(f′ kpt ,f fusion ))
其中表征了每个关键点检测子在场景中的相似度计算的权重图,/>为最终检测的关键点坐标。in A weight map that represents the similarity calculation of each key point detector in the scene,/> are the key point coordinates of the final detection.
优选的,所述将所述关键点输入到预设的多尺度位姿预测网络中,将局部结构信息聚合到关键点中得到具有多尺度信息的关键点,具体包括:Preferably, the key points are input into a preset multi-scale pose prediction network, and local structural information is aggregated into key points to obtain key points with multi-scale information, specifically including:
对于每个检测到的3D关键点通过提取/>最近邻的场景点的融合特征,通过cross attention将局部结构信息聚合到关键点中:For each detected 3D keypoint By extracting/> The fusion features of the nearest neighbor scene points aggregate local structural information into key points through cross attention:
其中,knn表示欧氏空间中的k-近邻点,index表示索引操作。Among them, knn represents the k-nearest neighbor point in the Euclidean space, and index represents the index operation.
优选的,所述通过具有多尺度信息的关键点特征预测关键点在物体空间坐标系下的位置,并将输出的关键点在场景中的位置、关键点在场景中的特征以及关键点在物体空间坐标系下的位置和特征拼接后形成多组对应关系,通过多层感知机输出最终的物体6D位姿,具体包括:Preferably, the key point features with multi-scale information are used to predict the position of the key point in the object space coordinate system, and the output key point's position in the scene, the key point's characteristics in the scene and the key point's position in the object are The positions and features in the spatial coordinate system are spliced to form multiple sets of correspondences, and the final 6D pose of the object is output through the multi-layer perceptron, including:
通过关键点特征预测关键点在物体坐标空间下的位置:Predict the position of key points in object coordinate space through key point features:
并将输出的关键点在场景中的位置、关键点在场景中的特征以及关键点在物体空间坐标系下的位置和特征拼接后形成Ns组对应关系,通过多层感知机输出最终的物体6D位姿:And the positions of the output key points in the scene, the characteristics of the key points in the scene, and the positions and characteristics of the key points in the object space coordinate system are spliced together to form N s sets of correspondences, and the final object is output through the multi-layer perceptron. 6D pose:
第二方面,提供了一种基于动态关键点检测的类别级物体6D位姿估计系统,其特征在于,所述系统包括:In the second aspect, a category-level object 6D pose estimation system based on dynamic key point detection is provided, which is characterized in that the system includes:
接收模块,用于接收物体的图像数据,所述图像数据包括RDB图像和点云,所述点云通过结合相机内部参数将深度图中的像素经过随机采样后投影到场景中形成;A receiving module, configured to receive image data of an object. The image data includes RDB images and point clouds. The point clouds are formed by randomly sampling the pixels in the depth map and projecting them into the scene in combination with internal parameters of the camera;
特征提取与融合模块,用于分别提取RDB图像的图像特征和点云的点云特征,并将图像特征和点云特征进行拼接融合得到融合后的特征;The feature extraction and fusion module is used to extract the image features of the RDB image and the point cloud features of the point cloud respectively, and splice and fuse the image features and point cloud features to obtain the fused features;
关键点提取模块,用于将融合后的特征输入到预设的动态关键点检测网络提取物体的关键点;The key point extraction module is used to input the fused features into the preset dynamic key point detection network to extract the key points of the object;
处理与输出模块,用于将所述关键点输入到预设的多尺度位姿预测网络中,将局部结构信息聚合到关键点中得到具有多尺度信息的关键点,通过具有多尺度信息的关键点特征预测关键点在物体空间坐标系下的位置,并将输出的关键点在场景中的位置、关键点在场景中的特征以及关键点在物体空间坐标系下的位置和特征拼接后形成多组对应关系,通过多层感知机输出最终的物体6D位姿。The processing and output module is used to input the key points into the preset multi-scale pose prediction network, aggregate the local structure information into the key points to obtain key points with multi-scale information, and use the key points with multi-scale information to The point feature predicts the position of the key point in the object space coordinate system, and splices the output key point's position in the scene, the key point's characteristics in the scene, and the key point's position and characteristics in the object space coordinate system to form a multi-point Group correspondence relationship, output the final object 6D pose through the multi-layer perceptron.
第三方面,提供了一种存储一个或多个程序的计算机可读存储介质,所述一个或多个程序包括指令,所述指令当由计算设备执行时,使得所述计算设备执行所述的方法中的任一方法。In a third aspect, a computer-readable storage medium is provided that stores one or more programs, the one or more programs including instructions that, when executed by a computing device, cause the computing device to perform the any of the methods.
第四方面,提供了一种计算设备,包括:A fourth aspect provides a computing device, including:
一个或多个处理器、存储器以及一个或多个程序,其中一个或多个程序存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个程序包括用于执行所述的方法中的任一方法的指令。one or more processors, memory, and one or more programs, wherein one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including Instructions for performing any of the methods described.
(三)有益效果(3) Beneficial effects
本发明基于动态关键点检测的类别级物体6D位姿估计方法及系统,可以自适应的从观测场景中提取物体的关键点,即使在场景存在较多噪声点,遮挡情况严重时也能达到较好的效果。其次,本专利设计了两个模块分别考虑关键点局部的特征以及基于对应关系的位姿预测网络。能够更好的提取关键点周围的局部空间几何特征并且利用对应关系回归物体位姿。并在数字孪生仿真系统中进行训练,最终在现有数据集上能大幅提升类别级物体6D位姿估计的精度。The present invention is a category-level object 6D pose estimation method and system based on dynamic key point detection. It can adaptively extract the key points of objects from the observation scene. Even when there are many noise points in the scene and the occlusion situation is serious, it can achieve a higher accuracy. Good results. Secondly, this patent designs two modules that consider local features of key points and pose prediction networks based on correspondence relationships. It can better extract the local spatial geometric features around key points and use the corresponding relationship to return the object pose. And it is trained in the digital twin simulation system, which can ultimately greatly improve the accuracy of 6D pose estimation of category-level objects on the existing data set.
附图说明Description of the drawings
图1为本发明基于动态关键点检测的类别级物体6D位姿估计方法流程图;Figure 1 is a flow chart of the 6D pose estimation method of category-level objects based on dynamic key point detection according to the present invention;
图2为本发明实施例中方法解析图。Figure 2 is an analysis diagram of the method in the embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.
实施例Example
如图1-2所示,本发明实施例提供一种基于动态关键点检测的类别级物体6D位姿估计方法,包括:As shown in Figure 1-2, embodiments of the present invention provide a category-level object 6D pose estimation method based on dynamic key point detection, including:
接收物体的图像数据,所述图像数据包括RDB图像和点云,所述点云通过结合相机内部参数将深度图中的像素经过随机采样后投影到场景中形成;Receive image data of the object. The image data includes RDB images and point clouds. The point clouds are formed by randomly sampling the pixels in the depth map and projecting them into the scene in combination with internal parameters of the camera;
分别提取RDB图像的图像特征和点云的点云特征,并将图像特征和点云特征进行拼接融合得到融合后的特征;Extract the image features of the RDB image and the point cloud features of the point cloud respectively, and splice and fuse the image features and point cloud features to obtain the fused features;
将融合后的特征输入到预设的动态关键点检测网络提取物体的关键点;Input the fused features into the preset dynamic key point detection network to extract the key points of the object;
将所述关键点输入到预设的多尺度位姿预测网络中,将局部结构信息聚合到关键点中得到具有多尺度信息的关键点,通过具有多尺度信息的关键点特征预测关键点在物体空间坐标系下的位置,并将输出的关键点在场景中的位置、关键点在场景中的特征以及关键点在物体空间坐标系下的位置和特征拼接后形成多组对应关系,通过多层感知机输出最终的物体6D位姿。Input the key points into the preset multi-scale pose prediction network, aggregate the local structure information into key points to obtain key points with multi-scale information, and predict the key points in the object through the key point features with multi-scale information. The position in the spatial coordinate system, and the output key point's position in the scene, the key point's characteristics in the scene, and the key point's position and characteristics in the object space coordinate system are spliced to form multiple sets of correspondences, through multiple layers The perceptron outputs the final 6D pose of the object.
进一步的,对于输入RGBD的图片,采用Resnet18作为RGB图像的特征提取器。对于深度图D,通过结合相机内参的方式将深度图中的像素经过随机采样后投影到场景中,形成点云。针对不同模态的输入数据,网络首先将输入的RGB图片送入Resnet18卷积神经网络中,提取图像的特征图frgb∈Rh×w×c。对于输入的点云特征,网络将其输入到Pointnet++点云特征提取网络中,提取点云的结构特征fpoint∈RN×C。其中N为点云的数量。接着通过相机内参将场景点云投射到图像特征图上,通过双线性插值提取场景点云在特征图上的对应特征fpoint→rgb∈RN×C。最终通过将两个模态的特征拼接后经过一个多层MLP输出得到融合后的特征ffusion∈RN×C。Furthermore, for input RGBD images, Resnet18 is used as the feature extractor for RGB images. For the depth map D, the pixels in the depth map are randomly sampled and projected into the scene by combining camera internal parameters to form a point cloud. For input data of different modalities, the network first sends the input RGB image into the Resnet18 convolutional neural network, and extracts the feature map f rgb ∈R h×w×c of the image. For the input point cloud features, the network inputs them into the Pointnet++ point cloud feature extraction network to extract the structural features f point ∈R N×C of the point cloud. Where N is the number of point clouds. Then the scene point cloud is projected onto the image feature map through the camera internal parameters, and the corresponding feature f point→rgb ∈R N×C of the scene point cloud on the feature map is extracted through bilinear interpolation. Finally, the fused feature f fusion ∈R N×C is obtained by splicing the features of the two modalities and outputting it through a multi-layer MLP.
进一步的对于每个场景输入,在提取了多模态融合特征后,我们设计了一个如图1所示的动态关键点检测网络,用于在场景中自适应的提取物体关键点。为了实现在不同场景中自适应的动态检测物体关键点,本项目拟引入注意力机制和Transformer Layer实现场景自适应。 表示Ns个随机初始化并会随着训练过程不断更新的KPT query,用于代表场景中的Ns个关键点。将这些代表不同关键点的query与从场景提取的融合后的特征ffusion∈RN×C通过cross attention层进行交互,并对KPT query进行场景自适应的更新:Further, for each scene input, after extracting multi-modal fusion features, we designed a dynamic key point detection network as shown in Figure 1, which is used to adaptively extract object key points in the scene. In order to achieve adaptive dynamic detection of object key points in different scenes, this project plans to introduce the attention mechanism and Transformer Layer to achieve scene adaptation. Represents N s KPT queries that are randomly initialized and continuously updated with the training process, and are used to represent N s key points in the scene. These queries representing different key points are interacted with the fused features f fusion ∈R N×C extracted from the scene through the cross attention layer, and the KPT query is updated scene adaptively:
f′kpt=MHCA(ffusion;fkpt),#f′ kpt =MHCA(f fusion ; f kpt ),#
经过更新后的KPT query聚合了场景自适应的特征,用于接下来的关键点检测模块。接下来利用基于相似性的热度图生成策略,利用每个KPT query与场景点计算相似性后,通过热度图加权的方式生成关键点的3D位置,以及3D特征具体而言:The updated KPT query aggregates scene adaptive features and is used in the subsequent key point detection module. Next, a similarity-based heat map generation strategy is used. After calculating the similarity between each KPT query and scene points, the 3D positions of key points and 3D features are generated through heat map weighting. Specifically:
heatmap=Softmax(Similarity(f′kpt,ffusion))#heatmap=Softmax(Similarity(f′ kpt ,f fusion ))#
其中表征了每个关键点检测子在场景中的相似度计算的权重图,/>为最终检测的关键点坐标。动态检测的关键点能够适应不同的场景和变化,无论目标物体的位置、角度、光照条件等如何改变,都能够准确地检测到关键点,并且这组关键点可以泛化到同一类别的不同实例物体中,使得模型的泛化性更强,更适应类别级物体位姿估计的任务。使得后续能够更好地进行精确和鲁棒的位姿估计。in A weight map that represents the similarity calculation of each key point detector in the scene,/> are the key point coordinates of the final detection. The key points of dynamic detection can adapt to different scenes and changes. No matter how the position, angle, lighting conditions, etc. of the target object change, the key points can be accurately detected, and this set of key points can be generalized to different instances of the same category. objects, making the model more generalizable and more suitable for the task of category-level object pose estimation. This enables better accurate and robust pose estimation in the future.
进一步的,本专利设计了多尺度位姿预测网络,该部分主要分为局部特征聚合模块和基于对应关系的位姿预测网络两个模块:Furthermore, this patent designs a multi-scale pose prediction network, which is mainly divided into two modules: a local feature aggregation module and a correspondence-based pose prediction network:
局部特征聚合模块。为了使每个关键点能够更好的提取场景中的局部信息,从而产生多尺度的特征。本项目拟提出关键点位置的局部特征聚合模块。具体而言,对于每个检测到的3D关键点通过提取其最近邻的场景点的融合特征,通过cross attention将局部结构信息聚合到关键点中:Local feature aggregation module. In order to enable each key point to better extract local information in the scene, thereby generating multi-scale features. This project plans to propose a local feature aggregation module for key point locations. Specifically, for each detected 3D keypoint By extracting the fusion features of its nearest neighbor scene points, local structural information is aggregated into key points through cross attention:
其中knn表示欧氏空间中的k-近邻点,index表示索引操作。通过local注意力计算聚合局部特征使得关键点具有多尺度的信息,能够更好的预测场景物体位姿。Among them, knn represents the k-nearest neighbor point in the Euclidean space, and index represents the index operation. By aggregating local features through local attention calculation, key points have multi-scale information, which can better predict the pose of scene objects.
基于对应关系的位姿预测网络。为了根据网络输出的对应关系回归物体位姿,利用先进的深度神经网络拟合传统的最小二乘算法,使得拟合对应关系输出的位姿更具鲁棒性,本方法首先通过关键点特征预测关键点在物体坐标空间下的位置:Correspondence-based pose prediction network. In order to regress the object pose according to the correspondence relationship output by the network, an advanced deep neural network is used to fit the traditional least squares algorithm, making the pose output from the fitted correspondence relationship more robust. This method first predicts the key point features The position of the key point in the object coordinate space:
并将输出的关键点在场景中的位置,关键点在场景中的特征以及关键点在物体空间坐标系下的位置和特征拼接后形成Ns组对应关系,通过多层感知机输出最终的物体位姿。具体而言:And the positions of the output key points in the scene, the characteristics of the key points in the scene, and the positions and characteristics of the key points in the object space coordinate system are spliced together to form N s sets of correspondences, and the final object is output through the multi-layer perceptron. Posture. in particular:
基于对应性关系的预测更符合数学中最小二乘拟合物体坐标对的关系,能使网络更容易学到位姿的映射关系,而且在本方法中利用自适应关键点产生的对应关系能去除场景中噪声点的影响,仅仅选取最具代表性的部分关键点,使得整体计算精度更高,鲁棒性更强且计算更为高效。The prediction based on the correspondence relationship is more in line with the relationship between least squares fitting object coordinate pairs in mathematics, which can make it easier for the network to learn the mapping relationship of pose, and in this method, the correspondence relationship generated by adaptive key points can be used to remove the scene In order to eliminate the influence of noise points, only the most representative key points are selected, making the overall calculation more accurate, more robust and more efficient.
本专利还设计了利用数字孪生仿真技术辅助模型的训练与验证模型的性能的方法。为了采集物体的6D位姿数据,我们将虚拟RGBD传感器部署到数字孪生环境中,这些传感器模拟了现实世界中的相机、激光雷达等感知设备。这些虚拟传感器会实时记录物体的位置和姿态信息,生成大量的模拟数据。我们利用这些模拟数据来训练和验证深度学习模型。通过在数字孪生环境中进行模拟实验,从而验证模型的性能,包括其准确性、鲁棒性和泛化能力。This patent also designs a method to use digital twin simulation technology to assist model training and verify model performance. In order to collect the 6D pose data of objects, we deploy virtual RGBD sensors into the digital twin environment. These sensors simulate sensing devices such as cameras and lidar in the real world. These virtual sensors will record the position and attitude information of objects in real time and generate a large amount of simulation data. We use these simulated data to train and validate deep learning models. By conducting simulation experiments in a digital twin environment, the performance of the model, including its accuracy, robustness and generalization ability, is verified.
在本专利中,RGBD多模态特征提取主干网络中的Resnet卷积神经网络和pointnet++点云特征提取网络为背景技术中的已有技术,本专利基于此技术新增了自适应关键点检测网络,并且设计了新的基于局部特征聚合的位姿估计网络,并在数字孪生环境中进行模拟实验以验证技术的有效性In this patent, the Resnet convolutional neural network and pointnet++ point cloud feature extraction network in the RGBD multi-modal feature extraction backbone network are existing technologies in the background technology. This patent adds an adaptive key point detection network based on this technology. , and designed a new pose estimation network based on local feature aggregation, and conducted simulation experiments in a digital twin environment to verify the effectiveness of the technology.
本发明又一个实施例提供了一种基于动态关键点检测的类别级物体6D位姿估计系统,其特征在于,所述系统包括:Yet another embodiment of the present invention provides a category-level object 6D pose estimation system based on dynamic key point detection, which is characterized in that the system includes:
接收模块,用于接收物体的图像数据,所述图像数据包括RDB图像和点云,所述点云通过结合相机内部参数将深度图中的像素经过随机采样后投影到场景中形成;A receiving module, configured to receive image data of an object. The image data includes RDB images and point clouds. The point clouds are formed by randomly sampling the pixels in the depth map and projecting them into the scene in combination with internal parameters of the camera;
特征提取与融合模块,用于分别提取RDB图像的图像特征和点云的点云特征,并将图像特征和点云特征进行拼接融合得到融合后的特征;The feature extraction and fusion module is used to extract the image features of the RDB image and the point cloud features of the point cloud respectively, and splice and fuse the image features and point cloud features to obtain the fused features;
关键点提取模块,用于将融合后的特征输入到预设的动态关键点检测网络提取物体的关键点;The key point extraction module is used to input the fused features into the preset dynamic key point detection network to extract the key points of the object;
处理与输出模块,用于将所述关键点输入到预设的多尺度位姿预测网络中,将局部结构信息聚合到关键点中得到具有多尺度信息的关键点,通过具有多尺度信息的关键点特征预测关键点在物体空间坐标系下的位置,并将输出的关键点在场景中的位置、关键点在场景中的特征以及关键点在物体空间坐标系下的位置和特征拼接后形成多组对应关系,通过多层感知机输出最终的物体6D位姿。The processing and output module is used to input the key points into the preset multi-scale pose prediction network, aggregate the local structure information into the key points to obtain key points with multi-scale information, and use the key points with multi-scale information to The point feature predicts the position of the key point in the object space coordinate system, and splices the output key point's position in the scene, the key point's characteristics in the scene, and the key point's position and characteristics in the object space coordinate system to form a multi-point Group correspondence relationship, output the final object 6D pose through the multi-layer perceptron.
本申请的实施例可提供为方法或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。本申请实施例中的方案可以采用各种计算机语言实现,例如,面向对象的程序设计语言Java和直译式脚本语言JavaScript等。Embodiments of the present application may be provided as methods or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein. The solutions in the embodiments of this application can be implemented using various computer languages, such as the object-oriented programming language Java and the literal scripting language JavaScript.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations are mutually exclusive. any such actual relationship or sequence exists between them. Furthermore, the terms "comprises," "comprises," or any other variation thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311546440.2A CN117456003B (en) | 2023-11-20 | 2023-11-20 | Category-level object 6D pose estimation method and system based on dynamic key point detection |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311546440.2A CN117456003B (en) | 2023-11-20 | 2023-11-20 | Category-level object 6D pose estimation method and system based on dynamic key point detection |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN117456003A true CN117456003A (en) | 2024-01-26 |
| CN117456003B CN117456003B (en) | 2025-06-10 |
Family
ID=89587443
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311546440.2A Active CN117456003B (en) | 2023-11-20 | 2023-11-20 | Category-level object 6D pose estimation method and system based on dynamic key point detection |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117456003B (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118247351A (en) * | 2024-05-23 | 2024-06-25 | 浙江大学 | Real-time object three-dimensional pose estimation method based on multi-frame monocular camera |
| CN118799393A (en) * | 2024-06-20 | 2024-10-18 | 哈尔滨工业大学 | Bidirectional fusion 6D object pose estimation method |
| CN120107356A (en) * | 2025-02-17 | 2025-06-06 | 同济大学 | A method and system for class-level object pose estimation |
Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112270249A (en) * | 2020-10-26 | 2021-01-26 | 湖南大学 | Target pose estimation method fusing RGB-D visual features |
| CN114359377A (en) * | 2021-12-31 | 2022-04-15 | 清华大学深圳国际研究生院 | A real-time 6D pose estimation method and computer-readable storage medium |
| KR20220081261A (en) * | 2020-12-08 | 2022-06-15 | 삼성전자주식회사 | Method and apparatus for object pose estimation |
| CN114663514A (en) * | 2022-05-25 | 2022-06-24 | 浙江大学计算机创新技术研究院 | Object 6D attitude estimation method based on multi-mode dense fusion network |
| CN115147599A (en) * | 2022-06-06 | 2022-10-04 | 浙江大学 | A six-degree-of-freedom pose estimation method for multi-geometric feature learning for occluded and truncated scenes |
| CN115601430A (en) * | 2022-10-27 | 2023-01-13 | 西安交通大学(Cn) | Texture-free high-reflection object pose estimation method and system based on key point mapping |
| CN116152799A (en) * | 2023-01-18 | 2023-05-23 | 美的集团(上海)有限公司 | Image pose processing method and device, readable storage medium and robot |
| US11688139B1 (en) * | 2019-03-22 | 2023-06-27 | Bertec Corporation | System for estimating a three dimensional pose of one or more persons in a scene |
| CN116580085A (en) * | 2023-03-13 | 2023-08-11 | 联通(上海)产业互联网有限公司 | Deep learning algorithm for 6D pose estimation based on attention mechanism |
| CN116630394A (en) * | 2023-07-25 | 2023-08-22 | 山东中科先进技术有限公司 | Multi-mode target object attitude estimation method and system based on three-dimensional modeling constraint |
| US20230316563A1 (en) * | 2022-04-05 | 2023-10-05 | Bluewrist Inc. | Systems and methods for pose estimation via radial voting based keypoint localization |
| CN116958958A (en) * | 2023-07-31 | 2023-10-27 | 中国科学技术大学 | Self-adaptive class-level object attitude estimation method based on graph convolution double-flow shape prior |
-
2023
- 2023-11-20 CN CN202311546440.2A patent/CN117456003B/en active Active
Patent Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11688139B1 (en) * | 2019-03-22 | 2023-06-27 | Bertec Corporation | System for estimating a three dimensional pose of one or more persons in a scene |
| CN112270249A (en) * | 2020-10-26 | 2021-01-26 | 湖南大学 | Target pose estimation method fusing RGB-D visual features |
| KR20220081261A (en) * | 2020-12-08 | 2022-06-15 | 삼성전자주식회사 | Method and apparatus for object pose estimation |
| CN114359377A (en) * | 2021-12-31 | 2022-04-15 | 清华大学深圳国际研究生院 | A real-time 6D pose estimation method and computer-readable storage medium |
| US20230316563A1 (en) * | 2022-04-05 | 2023-10-05 | Bluewrist Inc. | Systems and methods for pose estimation via radial voting based keypoint localization |
| CN114663514A (en) * | 2022-05-25 | 2022-06-24 | 浙江大学计算机创新技术研究院 | Object 6D attitude estimation method based on multi-mode dense fusion network |
| CN115147599A (en) * | 2022-06-06 | 2022-10-04 | 浙江大学 | A six-degree-of-freedom pose estimation method for multi-geometric feature learning for occluded and truncated scenes |
| CN115601430A (en) * | 2022-10-27 | 2023-01-13 | 西安交通大学(Cn) | Texture-free high-reflection object pose estimation method and system based on key point mapping |
| CN116152799A (en) * | 2023-01-18 | 2023-05-23 | 美的集团(上海)有限公司 | Image pose processing method and device, readable storage medium and robot |
| CN116580085A (en) * | 2023-03-13 | 2023-08-11 | 联通(上海)产业互联网有限公司 | Deep learning algorithm for 6D pose estimation based on attention mechanism |
| CN116630394A (en) * | 2023-07-25 | 2023-08-22 | 山东中科先进技术有限公司 | Multi-mode target object attitude estimation method and system based on three-dimensional modeling constraint |
| CN116958958A (en) * | 2023-07-31 | 2023-10-27 | 中国科学技术大学 | Self-adaptive class-level object attitude estimation method based on graph convolution double-flow shape prior |
Non-Patent Citations (6)
| Title |
|---|
| SHENG, XIHUA, ET AL.: "Deep-PCAC: An End-to-End Deep Lossy Compression Framework for Point Cloud Attributes", 《 IEEE TRANSACTIONS ON MULTIMEDIA》, 31 December 2022 (2022-12-31) * |
| WANG CHEN, ET AL.: "6-Pack:Category-level 6D pose tracker with anchor-based keypoints", 《ARXIV》, 23 October 2019 (2019-10-23) * |
| XIAO LIN, ET AL.: "Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation", 《2024 IEEE/CVF CONFERENCE ON CVPR》, 16 September 2024 (2024-09-16) * |
| ZOU, L, ET AL.: "6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-Based Instance Representation Learning", 《 IEEE TRANSACTIONS ON IMAGE PROCESSING》, 31 December 2022 (2022-12-31) * |
| 王太勇等.: "基于关键点特征融合的六自由度位姿估计方法", 《天津大学学报(自然科学与工程技术版)》, 7 March 2022 (2022-03-07) * |
| 王涛.: "机器人抓取目标的激光点云识别", 《中国优秀硕士学位论文全文库(电子期刊)》, 15 February 2023 (2023-02-15) * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118247351A (en) * | 2024-05-23 | 2024-06-25 | 浙江大学 | Real-time object three-dimensional pose estimation method based on multi-frame monocular camera |
| CN118799393A (en) * | 2024-06-20 | 2024-10-18 | 哈尔滨工业大学 | Bidirectional fusion 6D object pose estimation method |
| CN118799393B (en) * | 2024-06-20 | 2025-06-06 | 哈尔滨工业大学 | Bidirectional fusion 6D object pose estimation method |
| CN120107356A (en) * | 2025-02-17 | 2025-06-06 | 同济大学 | A method and system for class-level object pose estimation |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117456003B (en) | 2025-06-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113205466B (en) | A residual defect cloud completion method based on latent space topological structure constraints | |
| CN111507222B (en) | A framework for 3D object detection based on multi-source data knowledge transfer | |
| CN117456003A (en) | Category-level object 6D pose estimation method and system based on dynamic key point detection | |
| CN114255238A (en) | Three-dimensional point cloud scene segmentation method and system fusing image features | |
| CN113450408A (en) | Irregular object pose estimation method and device based on depth camera | |
| CN114663502A (en) | Object posture estimation and image processing method and related equipment | |
| Wang et al. | Stream query denoising for vectorized hd-map construction | |
| US12175698B2 (en) | Method and apparatus with object pose estimation | |
| CN112149590A (en) | A method of hand key point detection | |
| CN113313176B (en) | A point cloud analysis method based on dynamic graph convolutional neural network | |
| CN114612494B (en) | A design method for visual odometry of mobile robots in dynamic scenes | |
| CN113592015B (en) | Method and device for positioning and training feature matching network | |
| CN112562001B (en) | Method, device, equipment and medium for 6D pose estimation of an object | |
| CN115205654A (en) | A Novel Monocular Vision 3D Object Detection Method Based on Keypoint Constraints | |
| CN117593368A (en) | 6D pose estimation method based on iterative attention fusion network | |
| Wu et al. | Sc-wls: Towards interpretable feed-forward camera re-localization | |
| CN116182894A (en) | A monocular visual odometer method, device, system and storage medium | |
| CN114266967A (en) | Cross-source remote sensing data target identification method based on symbolic distance characteristics | |
| CN113888629A (en) | RGBD camera-based rapid object three-dimensional pose estimation method | |
| CN110634160B (en) | 3D Keypoint Extraction Model Construction and Pose Recognition Method of Target in 2D Graphics | |
| CN116912238B (en) | Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion | |
| CN117710645A (en) | Dynamic scene VSLAM optimization method based on fusion attention mechanism and lightweight neural network | |
| CN112818965B (en) | Multi-scale image target detection method and system, electronic equipment and storage medium | |
| CN119762584A (en) | A target 6D pose estimation method guided by neighborhood perception information | |
| Yuan et al. | SHREC 2020 track: 6D object pose estimation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |