Abstract: Video summarization and captioning condense content by selecting keyframes and generating language descriptions, integrating both visual and textual perspectives. Existing video-and-language ...
Abstract: Typical LiDAR SLAM architectures feature a front-end for odometry estimation and a back-end for refining and optimizing the trajectory and map, commonly through loop closures. However, loop ...
OKVIS2-X is a multi-sensor SLAM system based on a factor graph, and is a non-trivial extension of the sparse, landmark-based OKVIS2. OKVIS2-X supports fusing multiple cameras and an IMU, with optional ...