Closing the Loop: 3D Object Tracking for Advanced Robotic Manipulation
Manuel Stoiber
Dissertation submitted to the Technical University of Munich
Tracking objects and kinematic structures and determining their poses and configurations is an essential task in computer vision. Its application ranges from augmented reality to robotic perception. Given 3D meshes and kinematic information, the goal is to robustly estimate both the rotation and translation of each body. The M3T library is able to consider images from multiple depth and color cameras. It allows to fuse information from depth, region, and texture modalities to simultaneously predict the pose and configuration of multiple multi-body objects. M3T is able to consider unknown occlusions using depth camera measurements or model known occlusions using depth and silhouette renderers. The algorithm is highly efficient and is able to provide results in real time, even for complex kinematic structures. To support a wide range of kinematic structures, object characteristics, and camera configurations, the overall framework is very modular and allows a flexible combination of different components such as cameras, links, constraints, modalities, viewers, detectors, refiners, publishers, and subscribers.
The repository is organized in the following folders:
include/: header files of the M3T librarysrc/: source files of the M3T librarythird_party/: external header-only librariestest/: unit testsdata/: data that is required for examples and unit testsexamples/: example files for tracking as well as for evaluation on different datasetsdoc/: files for documentation
Use CMake to build the library from source. The following dependencies are required: Eigen 3, GLEW, GLFW 3, and OpenCV 4. In addition, unit tests are implemented using gtest, while images from an Azure Kinect or RealSense camera can be streamed using the K4A and realsense2 libraries. All three libraries are optional and can be disabled using the CMake flags USE_GTEST, USE_AZURE_KINECT, and USE_REALSENSE. If OpenCV 4 is installed with CUDA, feature detectors used in the texture modality are able to utilize the GPU. If CMake finds OpenMP, the code is compiled using multithreading and vectorization for some functions. Finally, the documentation is built if Doxygen with dot is detected. Note that links to pages or classes that are embedded in this readme only work in the generated documentation. After a correct build, it should be possible to successfully execute all tests in ./gtest_run. For maximum performance, ensure that the library is created in Release mode, and, for example, use -DCMAKE_BUILD_TYPE=Release.
In the following, the tracking algorithm is described in detail. For this, we introduce the overall process executed by the tracker, describe the used components, and explain how they have to be configured.
The tracking process is coordinated by the Tracker object. Usually, only a single tracker exists that references all relevant components. Using the method RunTrackerProcess, the tracker executes the main methods of different components in the correct order. It considers the state of individual kinematic structures represented by Optimizer objects. In general, they can be in the states detecting, starting, and tracking. Also, kinematic structures may be idle. States can be influenced by the user with the methods StartTracking, StopTracking, and ExecuteDetection. Each method allows to specify names of optimizers to affect the respective kinematic structures. In addition, it is possible to change the state of all kinematic structures by pressing the T key to start tracking, the S key to stop tracking, the D key to start the detection, and the X key to start the detection and directly start tracking. With QuitTrackerProcess or the Q key, one can quit the tracking process.
Each iteration of the tracker process starts with an update of all Camera and Subscriber objects. Subsequently, given that Subscriber objects are able to change the poses of individual Link and Body objects, poses are updated to be consistent. After that, Detecting, Starting, and Tracking Steps are executed. Each step only considers objects associated with kinematic structures in the corresponding state. Finally, once individual steps have been finished, Publishers and Viewers are updated. The process then starts from the beginning. An overview of the major steps in the tracker process is shown in the following algorithm:
Tracker Process: 1: while run tracker process do 2: Update all Camera objects 3: Update all Subscriber objects 4: Calculate consistent Link and Body poses 5: Run Detecting Step (for objects associated with the state detecting) 6: Run Starting Step (for objects associated with the state starting) 7: Run Tracking Step (for objects associated with the state tracking) 8: Update all Publisher objects 9: Update all Viewer objects 10: end while
For the Detecting Step, first, all Detector objects associated with the state detecting are run. If the detection is successful, the Detector updates the poses of all Link and Body objects in the kinematic structure. Subsequently, for all kinematic structures that were successfully detected, associated Refiner objects improve predictions. Finally, if desired, successfully detected structures move to the state starting. An overview of the Detecting Step is shown in the following algorithm:
Detecting Step: 1: Run Detector objects 2: Run Refiner objects for successful detections 3: If desired, move to state starting for successful detections
In the Starting Step, first, all Renderer objects that are required by Modality objects are executed. After that, each Modality is started. This operation, for example, initializes internal color histograms of RegionModality objects. Subsequently, the Starting Step also performs the initialization of shared ColorHistograms objects. Finally, each kinematic structure automatically moves into the state tracking. An overview of the individual steps is shown in the following algorithm:
Starting Step: 1: Run Renderer objects required for the starting of Modality objects 2: Start Modality objects 3: Initialize ColorHistograms objects 4: Move to state tracking
Finally, the Tracking Step considers the estimation of poses by iteratively computing correspondences and pose updates. To calculate correspondences for Modality objects, it first runs required Renderer objects. Typical examples are SilhouetteRenderer objects that are used to validate individual contour or surface points. Afterwards, correspondences are calculated. This is typically followed by two update iterations. In each iteration, gradient vectors and Hessian matrices are calculated for all Modality objects. Afterwards, kinematic structures are updated. Optimizer objects thereby compute gradient vectors and Hessian matrices of SoftConstraint objects, calculate the Jacobian of each Link object, compute Jacobians and residuals of Constraint objects, and, finally, assemble everything in a single linear system of equations. Optimizer objects then solve the equations and update the poses of all Link and Body objects. Subsequently, the iterations are repeated. Finally, after the last iteration, required Renderer objects are run, and Modality and ColorHistograms objects are updated. An overview of the individual steps is shown in the following algorithm:
Tracking Step: 1: for n_corr_iterations do 2: Run Renderer objects required for the calculation of correspondences 3: Calculate correspondences for Modality objects 4: for n_update_iterations do 5: Calculate gradients and Hessians for Modality objects 6: for all Optimizer objects do 7: Calculate gradients and Hessians of referenced SoftConstraint objects 8: Calculate Jacobians of referenced Link objects 9: Calculate Jacobians and residuals of referenced Cosntraint objects 10: Assemble and solve linear system of equations 11: Update poses of referenced Link and Body objects 12: end for 13: end for 14: end for 15: Run Renderer objects for the update of Modality objects 16: Update Modality objects 17: Update ColorHistograms objects
The library consists of multiple components that provide specific functionality and data. They allow a flexible configuration of the tracker. The following main components exist (note that links only work in the Doxygen documentation):
- Camera: Specifies the abstract method
UpdateImageto fetch images and provides those images to other components. In addition, it stores intrinsics and defines the camera's pose relative to the world coordinate frame. In general, one differentiates betweenColorCameraandDepthCameraclasses. Currently, different implementations exist. To load images from disk, the LoaderColorCamera and LoaderDepthCamera classes are used. To stream data from a physical Azure Kinect or RealSense camera, the AzureKinectColorCamera, AzureKinectDepthCamera, RealSenseColorCamera, and RealSenseDepthCamera classes are used. - Body: Holds the pose of a rigid body relative to the world coordinate frame and stores the body's mesh geometry. It also specifies values for region and body IDs that are used in the rendering of silhouette images. It is implemented in the Body class.
- RendererGeometry: Stores geometric information from referenced
Bodyobjects on the GPU and provides everything required for rendering. It is implemented in the RendererGeometry class. - Renderer: Creates a rendering based on the geometry stored in a referenced
RendererGeometryobject, the pose of correspondingBodyobjects, the view of the renderer on the scene defined by intrinsics, and the renderer's pose relative to the world coordinate frame. For tracking, intrinsics and the pose are typically inferred from the values of a correspondingCameraobject. BothFullRenderers, which render an image according to intrinsics, andFocusedRenderers, which focus on referenced bodies and render an image with a defined size that is scaled and cropped to only include the referenced bodies, exist. For occlusion handling, region checking, and silhouette checking,FocusedRendererobjects are used. Based on the data in theRendererGeometry, different bodies can be considered. Depth images are obtained by the FullBasicDepthRenderer and FocusedBasicDepthRenderer classes. Silhouette images that contain either region or body IDs fromBodyobjects can be created using the FullSilhouetteRenderer and FocusedSilhouetteRenderer classes. Normal images that encode the surface normal vector in pixel colors can be created using the FullNormalRenderer and FocusedNormalRenderer classes. - Model: Precomputes geometric information from a referenced
Bodyand stores the created data. During the generation, geometries from externalBodyobjects can also be considered. This can, for example, be used to consider known occlusions before tracking. Two versions exist: a RegionModel that samples contour points and a DepthModel that computes surface points. They are used forRegionModalityandDepthModalityobjects, respectively. Modalities that consider the same geometry can use the same model. - ColorHistograms: Computes and holds color histograms for the foreground and background. Typically, the object is incorporated in the
RegionModality. However, if histograms should be shared,ColorHistogramscan also be defined separately and referenced by multiple modalities. It is implemented in the ColorHistograms class. - Modality: Considers information from a
CameraandBodyto calculate the gradient vector and Hessian matrix used for Newton optimization. Currently, threeModalityclasses exist: a RegionModality that sparsely considers region information and requires aColorCamera,Body, andRegionModel, a DepthModality that implements an ICP-like depth approach that considers data from aDepthCamera,Body, andDepthModel, and a TextureModality that uses keypoint features and requires information from aColorCamera,Body, andSilhouetteRenderer. Both theRegionModalityandTextureModalityallow to reference an additionalDepthCamerathat is close to theColorCamerato recognize unknown occlusions. To model known occlusions, modalities allow referencing aFocusedDepthRenderer. Finally, for the validation of contour points and surface points,RegionModalityandDepthModalityobjects use aSilhouetteRenderer. Also,RegionModalityobjects are able to reference externalColorHistograms. - Link: Defines the location of a joint reference frame relative to the reference frame of the Link and its parent. For the joint, rotational and translational motion along defined coordinate axes is allowed. To build a tree-like kinematic structure, a
Linkis able to reference multipleLinkobjects as children. In general,Linkobjects contain allModalityobjects that correspond to a single referencedBody. If aBodyis configured, the reference frame of theLinkis equal to that of theBody. Also, it is possible to create virtualLinkobjects without referencedBodyorModalityobjects to allow the definition of kinematics that do not feature visual information. - Constraint: Takes two
Linkobjects and defines the location of joint reference frames relative to corresponding link frames. Rotational and translational motion between the two joint frames is constrained for user-defined coordinate axes. It is implemented in the Constraint class. - SoftConstraint: Like the
Constraint, it takes twoLinkobjects and defines the location of joint reference frames relative to corresponding link frames. Also, rotational and translational motion between the two joint frames is again constrained for user-defined coordinate axes. However, in contrast to theConstraintclass, a standard deviation and a maximum distance for which the constraint remains inactive can be defined. It is implemented in the SoftConstraint class. - Optimizer: References a single
Linkat the root of a kinematic structure and multiple correspondingConstraintandSoftConstraintobjects. It computes Jacobian matrices for allLinkobjects and uses them to project gradient vectors and Hessian matrices fromModalityandSoftConstraintobjects to the full gradient and Hessian required for Newton optimization. In addition, anOptimizerprojects Jacobians and residuals fromConstraintobjects into the full system of linear equations. After a single Newton step, it updates the pose of allLinkandBodyobjects in the kinematic structure. It is implemented in the Optimizer class. - Detector: Depending on the implementation, a
Detectoris able to reference multipleOptimizerobjects. Based on the detected pose of the rootLink, it updates the pose of all correspondingLinkandBodyobjects in the kinematic structure. Two detectors are implemented: a StaticDetector that assigns a predefined pose and a ManualDetector that allows a user to define four points in a color image to infer the pose. - Viewer: Is used to visualize the current pose estimate. While ImageColorViewer and ImageDepthViewer classes simply visualize images from referenced
ColorCameraandDepthCameraobjects, NormalColorViewer and NormalDepthViewer classes overlay camera images with normal renderings that are based on the data from a referencedRendererGeometryobject. - Publisher: Is used to communicate data to an external source. It features the abstract method
UpdatePublisherthat is called at the end of the pose optimization. Its implementation is highly dependent on the information that should be exchanged. In most applications, it is used to publish predicted poses. Currently, noPublisheris implemented. - Subscriber: Is used to read data from an external source. It provides the abstract method
UpdateSubscriberthat is called before the optimization. As for thePublisherclass, the implementation is highly dependent on the source and exchanged information. When the main method is executed, the tracker is in a defined state, without anything happening in parallel. ASubscribercan, therefore, not only modify pose predictions but is able to change settings or the entire configuration of referenced objects. Currently, noSubscriberis implemented. - Refiner: References
Optimizerobjects to refine pose predictions of the corresponding kinematic structures. It thereby coordinates different methods provided byOptimizer,Constraint,SoftConstraint,Link,Modality,ColorHistograms, andRendererobjects. The refinement process is very similar to the Tracking Step. The only main difference is that methods featured in the Starting Step are executed each time before correspondences are calculated. It is implemented in the Refiner class. - Tracker: References
Optimizer,Refiner,Publisher,Subscriber,Viewer, andDetectorobjects. In addition to an orderly setup, it facilitates the entire tracking process. TheTrackerthereby coordinates the execution of methods provided by different objects. To control the process, it provides interfaces to start the tracking, stop the tracking, and execute the detection of kinematic structures represented by anOptimizer. It is implemented in the Tracker class.
Based on those components, a tracker can be configured. An example of a tracker that tracks a kinematic structure with two bodies using information from two color cameras and one depth camera is shown in the following illustration:
For each Body and ColorCamera, a RegionModality is configured. Similarly, each Body supports a DepthModality. To validate points, each Modality references a SilhouetteRenderer that corresponds to a Camera. In addition, the RegionModality objects 2 and 4 consider occlusions using the DepthCamera. For initialization, a ManualDetector predicts the pose of the Link and Body number 1 and uses the Optimizer to update the kinematic structure. Afterwards, the estimate is improved using a Refiner. Also, a Subscriber is configured that has access to all objects connected to the Optimizer. Finally, results are visualized using a NormalViewer, and pose estimates of Body number 2 are communicated using a Publisher.
As explained previously, M3T is a library that supports a wide variety of tracking scenarios. As a consequence, to start tracking, one has to first configure the tracker. For this, two options exist:
-
One option is to use C++ programming to set up and configure all objects according to one's scenario. An example that allows running the tracker on a sequence streamed from an AzureKinect is shown in
examples/run_on_camera_sequence.cpp. The executable thereby takes the path to a directory and names of multiple bodies. The directory has to containBodyandStaticDetectormetafiles that are called<BODY_NAME>.yamlfile and<BODY_NAME>_detector.yaml. Similarly,examples/run_on_recorded_sequence.cppallows to run the tracker on a sequence that was recorded usingrecord_camera_sequence.cpp. The executable allows the tracking of a single body that is detected using aManualDetector. It requires the metafiles for aLoaderColorCamera,Body, andManualDetector, as well as the path to a temporary directory in which generated model files are stored. -
In addition to the usage as a library in combination with C++ programming, the tracker can also be configured using a generator function together with a YAML file that defines the overall configuration. A detailed description of how to set up the YAML file is given in the documentation or the following markdown file. An example that shows how to use a generator is shown in
examples/run_generated_tracker.cpp. The executable requires a YAML file that is parsed by theGenerateConfiguredTracker()function to generate aTrackerobject. The main YAML file thereby defines how individual objects are combined and allows to specify YAML metafiles for individual components that do not use default parameters. An example of a simple YAML file is given indata/pen_paper_demo/conig.yaml. The example can be executed usingexamples/run_pen_paper_demo.cpp. In the code, a dedicatedPublisherclass is implemented that visualizes the drawing of a stabilo pen on a paper. The example runs out of the box using the configuration and data provided in thedata/pen_paper_demodirectory. Finally, more complex examples for the configuration of kinematic structures using YAML files are provided in the RTB datast, which can be downloaded from Zenodo.
In addition to constructor and setter methods, the parameters of all components can be defined using YAML metafiles. Parameter names in the YAML file are thereby defined to be equal to names used in the code. The most important metafiles and parameters are:
geometry_path: "INFER_FROM_NAME"
geometry_unit_in_meter: 1.0
geometry_counterclockwise: 1
geometry_enable_culling: 1
geometry2body_pose: !!opencv-matrix
rows: 4
cols: 4
dt: d
data: [ 1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, -0.006,
0, 0, 0, 1 ]
region_id: 50
geometry_path: path to wavefront obj file. UsingINFER_FROM_NAMEsets the path to<BODY_NAME>.obj.geometry_unit_in_meter: scale factor to scale the unit used in the wavefront obj file to meter.geometry_counterclockwise: true if winding order of triangles in wavefront obj is defined counter-clockwise.geometry_enable_culling: true if faces that are not facing toward the camera should be culled.geometry2body_pose: transformation that allows to set a different frame of reference for the object than defined by the wavefront obj file.region_id: value between 1 and 255 that is assigned to pixels on the body's silhouette bySilhouetteRenderersthat render the region ID. For bodies with similar color statistics, the same value should be assigned for the validation of contour points. Note that thebody_idis set automatically to a unique ID.
model_path: "INFER_FROM_NAME"
model_path: path to .bin file where the sparse viewpoint model is stored or where it should be generated. UsingINFER_FROM_NAMEsets the path to<MODEL_NAME>.bin.
link2world_pose: !!opencv-matrix
rows: 4
cols: 4
dt: d
data: [0.607674, 0.786584, -0.10962, -0.081876,
0.408914, -0.428214, -0.805868, -0.00546736,
-0.680823, 0.444881, -0.58186, 0.618302,
0, 0, 0, 1 ]
link2world_pose: transformation between link and world (typically camera frame) to which the link is set by the detector. If the link has a body, the correspondingbody2world_poseis set to the same pose.
reference_points:
- [ -0.0332, 0.0, 0.0]
- [ 0.0192, -0.0332, 0.0]
- [ 0.0192, 0.0332, 0.0]
- [ 0.0, 0.0, 0.0]
detector_image_path: "./detector_image.png"
reference_points: 3D points on the object surface given in the body frame. During manual detection, the user has to specify the corresponding 2D coordinates of those points in the image to define the object pose.detector_image_path: optional image that illustrates on which points the user has to click.
load_directory: "./"
intrinsics:
f_u: 638.633
f_v: 638.377
pp_x: 639.451
pp_y: 366.379
width: 1280
height: 720
camera2world_pose: !!opencv-matrix
rows: 4
cols: 4
dt: f
data: [ 1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1 ]
depth_scale: 0.001 # only for depth camera
image_name_pre: "color_camera_image_"
load_index: 0
n_leading_zeros: 0
image_name_post: ""
load_image_type: "png"
load_directory: directory from which images are loaded.intrinsics: intrinsics of the camera that was used to record images, with fu, fv, ppu, ppv, width, and height, respectively.camera2world_pose: fixed transformation between camera and world.depth_scale: scale with which pixel values have to be multiplied to get the depth in meter. (only required for depth cameras)image_name_pre: text at the beginning of image name, before load_index.load_index: index of the first image that is loaded.n_leading_zeros: minimum number of digits used in the image name with zero padding to reach correct number.image_name_post: text at the end of image name, after load_index.load_image_type: file format of images.
free_directions: [0, 0, 1, 0, 0, 0]
body2joint_pose: !!opencv-matrix
rows: 4
cols: 4
dt: d
data: [1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1 ]
joint2parent_pose: !!opencv-matrix
rows: 4
cols: 4
dt: d
data: [-1, 0, 0, 0.030601,
0, 1, 0, -0.006515,
0, 0, -1, 0,
0, 0, 0, 1 ]
-
free_directions: variated joint directions for x, y, and z rotation and translation, respectively. -
body2joint_pose: transformation between the joint and the body frame, which is referenced by this link. -
joint2parent_pose: transformation between the joint and the parent frame, for which this link is the child.
constraint_directions: [1, 1, 0, 1, 1, 1]
body12joint1_pose: !!opencv-matrix
rows: 4
cols: 4
dt: d
data: [1, 0, 0, 0.030601,
0, 1, 0, 0.006515,
0, 0, 1, 0,
0, 0, 0, 1 ]
body22joint2_pose: !!opencv-matrix
rows: 4
cols: 4
dt: d
data: [1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1 ]
constraint_directions: constraint joint directions for x, y, and z rotation and translation, respectively.body12joint1_pose: transformation between the joint and the body frame of link 1.body22joint2_pose: transformation between the joint and the body frame of link 2.
id_type: 1
id_type: if the value is 0, body IDs are rendered into the silhouette image, and if the value is 1, region IDs are rendered.
To start tracking your own objects, we recommend defining your own metafile for Body and StaticDetector objects and use the examples/run_on_camera_sequence.cpp. Note that depending on the parameters for the RunTrackerProcess function, detection and tracking will not start automatically. To start the detection, please press the D key on your keyboard. To start and stop the tracking, the T and S keys have to be used. To quit the application, press Q. If you would like to use the RealSense camera instead of the AzureKinect, please replace #include <m3t/azure_kinect_camera.h> with #include <m3t/realsense_camera.h> and all occurrences of AzureKinectColorCamera and AzureKinectDepthCamera with RealSenseColorCamera and RealSenseDepthCamera. If you would like to use another camera than the RealSense or Azure Kinect, we encourage you to create a class similar to the AzureKinectCamera class in src/azure_kinect_camera.cpp. To use results from the tracker in your own application, you can implement your own Publisher class that implements the method UpdatePublisher(). Similarly, application-dependent Subscriber classes that implement the UpdateSubscriber() method can be created. More complex examples for the configuration of kinematic structures using YAML files are provided in the RTB datast. The dataset includes config files for each object in the directory <OBJECT_NAME>/model/tracker_config. By providing the path to config.yaml as an argument to examples/run_generated_tracker, a tracker for the respective object is created. Note that the first time it is executed, model files have to be generated, which might take some time.
In addition to this short overview, detailed information on all components and parameters can be found in the documentation. To generate a tracker from a YAML file using the generator function, see the correspondinng documentation or markdown file for how to configure all the required components. For developers, an overview of the most important guidelines is provided in the developer guide.
The code in examples/evaluate_<DATASET_NAME>_dataset.cpp and examples/parameters_study_<DATASET_NAME>.cpp contains everything for the evaluation on the RTB, YCB-Video, OPT, Choi, and RBOT datasets. For the evaluation, please download the RTB, YCB-Video, OPT, Choi, and RBOT datasets and adjust the dataset_directory in the source code. Note that model files (e.g. 002_master_chef_can_depth_model.bin, 002_master_chef_can_region_model.bin, ...) will be created automatically and are stored in the specified external_directory. To reproduce evaluation results of DART on the RTB dataset, please unzip the datet/rtb_poses.zip file and store its content in the respective external_directory. For the evaluation of the YCB-Video dataset, please unzip data/ycb-video_poses.zip and, again, store its content in the corresponding external_directory. Also, for the evaluation of multi-region tracking, the content of the data/ycb-video_multi-region_<X>.zip files, which contain required .obj files, has to be copied into the external_directory. For the Choi dataset, the Matlab script in examples/dataset_converter/convert_choi_dataset.m has to be executed to convert .pcd files into .png images. Also, using a program such as MeshLab, all model files have to be converted from .ply to .obj files and stored in the folder external_directory/models. The RTB, OPT, and RBOT datasets work without any manual changes.
More details can be found in the following publications:
-
Closing the Loop: 3D Object Tracking for Advanced Robotic Manipulation
Manuel Stoiber
Dissertation submitted to the Technical University of Munich -
A Multi-body Tracking Framework - From Rigid Objects to Kinematic Structures
Manuel Stoiber, Martin Sundermeyer, Wout Boerdijk, and Rudolph Triebel
Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence: paper -
Fusing Visual Appearance and Geometry for Multi-Modality 6DoF Object Tracking
Manuel Stoiber, Mariam Elsayed, Anne E. Reichert, Florian Steidle, Dongheui Lee, and Rudolph Triebel
Submitted to IEEE/RSJ International Conference on Intelligent Robots 2023: paper -
Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects
Manuel Stoiber, Martin Sundermeyer, and Rudolph Triebel
IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022: paper -
SRT3D: A Sparse Region-Based 3D Object Tracking Approach for the Real World
Manuel Stoiber, Martin Pfanne, Klaus H. Strobl, Rudolph Triebel, and Alin Albu-Schäffer
International Journal of Computer Vision: paper -
A Sparse Gaussian Approach to Region-Based 6DoF Object Tracking
Manuel Stoiber, Martin Pfanne, Klaus H. Strobl, Rudolph Triebel, and Alin Albu-Schäffer
[Best Paper] Asian Conference on Computer Vision 2020: paper, supplementary