A Semantic OctoMap Mapping Method Based on CBAM-PSPNet

Authors

  • Xiaogang Ruan 1Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2Beijing Key Laboratory of Computational Intelligence and Intelligence System, Beijing 100124, China
  • Peiyuan Guo 1Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2Beijing Key Laboratory of Computational Intelligence and Intelligence System, Beijing 100124, China
  • Jing Huang 1Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2Beijing Key Laboratory of Computational Intelligence and Intelligence System, Beijing 100124, China

DOI:

https://doi.org/10.13052/jwe1540-9589.21315

Keywords:

SLAM, semantic mapping, OctoMap, semantic segmentation

Abstract

With the rapid development of computer vision and deep learning, researchers have begun to focus on the semantic characteristics of traditional Simultaneous Localization And Mapping in three-Dimensional scenes. The point cloud map generated by the traditional simultaneous localization and mapping method takes up considerable storage space and cannot extract semantic information from the scene, which cannot meet the requirements of intelligent robot navigation and high-level semantic understanding. To solve this problem, this paper proposes a semantic information fusion OctoMap method. First, the color and depth images obtained from RGB-D by ORB-SLAM2 are used to locate the camera. Second, the Convolutional Block Attention Module-Pyramid Scene Parsing Network is introduced to segment the input RGB image semantically to improve the segmentation accuracy and obtain high-level semantic information in the environment. Then, a semantic fusion algorithm based on Bayesian fusion is introduced to fuse multiview semantic information. Finally, the generated semantic point cloud is inserted into OctoMap, and its octree data structure is used to compress the storage space. Experimental results based on the ADE20K dataset show that, compared with Pyramid Scene Parsing Network, Convolutional Block Attention Module-Pyramid Scene Parsing Network improves Mean Pixel Accuracy by 2.55%, and Mean Intersection over Union by 1.88%. Experimental results based on the TUM dataset show that the proposed method greatly reduces storage space and achieves the effect of voxels by voxel dense semantic mapping compared with point clouds and a traditional OctoMap.

Downloads

Download data is not yet available.

Author Biographies

Xiaogang Ruan, 1Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2Beijing Key Laboratory of Computational Intelligence and Intelligence System, Beijing 100124, China

Xiaogang Ruan received the Ph.D. degree in control science and engineering from Zhejiang University, China in 1992. Now he is a professor of Beijing University of Technology, and he is also as a director of Institute of Artificial Intelligent and Robots (IAIR). His research interests include automatic control, artificial intelligence, and intelligent robot.

Peiyuan Guo, 1Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2Beijing Key Laboratory of Computational Intelligence and Intelligence System, Beijing 100124, China

Peiyuan Guo received the B.E. degree in building electricity and intelligence from Qingdao University of Technology, China in 2019. He is currently a master student in control science and engineering at Faculty of Information Technology of Beijing University of Technology, China. His research interest is SLAM.

Jing Huang, 1Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China 2Beijing Key Laboratory of Computational Intelligence and Intelligence System, Beijing 100124, China

Jing Huang received the Ph.D. degree in pattern recognition and intelligent system from Beijing University of Technology, China in 2016. Now she is an associate professor in Faculty of Information Technology, Beijing University of Technology, China. Her research interests include cognitive robotics, machine learning, and artificial Intelligence.

References

Cadena C, Carlone L, Carrillo H, et al. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age[J]. IEEE Transactions on robotics, 2016, 32(6): 1309–1332.

Hornung A, Wurm K M, Bennewitz M, et al. OctoMap: An efficient probabilistic 3D mapping framework based on octrees[J]. Autonomous robots, 2013, 34(3): 189–206.

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431–3440.

Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234–241.

Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2881–2890.

Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3–19.

Izadi S, Kim D, Hilliges O, et al. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera[C]//Proceedings of the 24th annual ACM symposium on User interface software and technology. 2011: 559–568.

Klein G, Murray D. Parallel tracking and mapping for small AR workspaces[C]//2007 6th IEEE and ACM international symposium on mixed and augmented reality. IEEE, 2007: 225–234.

Mur-Artal R, Montiel J M M, Tardos J D. ORB-SLAM: a versatile and accurate monocular SLAM system[J]. IEEE transactions on robotics, 2015, 31(5): 1147–1163.

Mur-Artal R, Tardós J D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras[J]. IEEE transactions on robotics, 2017, 33(5): 1255–1262.

Campos C, Elvira R, Rodríguez J J G, et al. ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM[J]. IEEE Transactions on Robotics, 2021.

Reddy N D, Singhal P, Krishna K M. Semantic motion segmentation using dense CRF formulation[C]//Proceedings of the 2014 Indian conference on computer vision graphics and image processing. 2014: 1–8.

Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.

Xue H, Liu C, Wan F, et al. Danet: Divergent activation for weakly supervised object localization[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 6589–6598.

Huang Z, Wang X, Huang L, et al. Ccnet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 603–612.

Fan H, Ling H. Sanet: Structure-aware network for visual tracking[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017: 42–49.

Pham T T, Reid I, Latif Y, et al. Hierarchical higher-order regression forest fields: An application to 3d indoor scene labelling[C]//Proceedings of the IEEE international conference on computer vision. 2015: 2246–2254.

Sünderhauf N, Dayoub F, McMahon S, et al. Place categorization and semantic mapping on a mobile robot[C]//2016 IEEE international conference on robotics and automation (ICRA). IEEE, 2016: 5729–5736.

Salas-Moreno R F, Newcombe R A, Strasdat H, et al. Slam++: Simultaneous localisation and mapping at the level of objects[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2013: 1352–1359.

Jadidi M G, Gan L, Parkison S A, et al. Gaussian processes semantic map representation[J]. arXiv preprint arXiv:1707.01532, 2017.

Sünderhauf N, Pham T T, Latif Y, et al. Meaningful maps with object-oriented semantic mapping[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017: 5079–5085.

McCormac J, Handa A, Davison A, et al. Semanticfusion: Dense 3d semantic mapping with convolutional neural networks[C]//2017 IEEE International Conference on Robotics and automation (ICRA). IEEE, 2017: 4628–4635.

Whelan T, Salas-Moreno R F, Glocker B, et al. ElasticFusion: Real-time dense SLAM and light source estimation[J]. The International Journal of Robotics Research, 2016, 35(14): 1697–1716.

Kweon I S, Hebert M, Krotkov E, et al. Terrain mapping for a roving planetary explorer[C]//IEEE International Conference on Robotics and Automation. IEEE, 1989: 997–1002.

Triebel R, Pfaff P, Burgard W. Multi-level surface maps for outdoor terrain mapping and loop closing[C]//2006 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2006: 2276–2282.

Xuan Z, David F. Real-Time Voxel Based 3D Semantic Mapping with a Hand Held RGB-D Camera. 2018[J]. GitHub, GitHub repository, https://github.com/floatlazer/semantic_slam.

Ran T, Yuan L, Zhang J, et al. RS-SLAM: A Robust Semantic SLAM in Dynamic Environments Based on RGB-D Sensor[J]. IEEE Sensors Journal, 2021, 21(18): 20657–20664.

Zhang J, Liu S, Gao B, et al. An improvement algorithm for OctoMap based on RGB-D SLAM[C]//2018 Chinese Control And Decision Conference (CCDC). IEEE, 2018: 5006–5011.

Sun L, Yan Z, Zaganidis A, et al. Recurrent-octomap: Learning state-based map refinement for long-term semantic mapping with 3-d-lidar data[J]. IEEE Robotics and Automation Letters, 2018, 3(4): 3749–3756.

Zhang L, Wei L, Shen P, et al. Semantic SLAM based on object detection and improved octomap[J]. IEEE Access, 2018, 6: 75545–75559.

Yu C, Liu Z, Liu X J, et al. DS-SLAM: A semantic visual SLAM towards dynamic environments[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018: 1168–1174.

Yue Y, Li R, Zhao C, et al. Probabilistic 3d semantic map fusion based on bayesian rule[C]//2019 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM). IEEE, 2019: 542–547.

Liu K, Fan Z, Liu M, et al. Object-aware Semantic Mapping of Indoor Scenes using Octomap[C]//2019 Chinese Control Conference (CCC). IEEE, 2019: 8671–8676.

He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770–778.

Ma L, Stückler J, Kerl C, et al. Multi-view deep learning for consistent semantic mapping with rgb-d cameras[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017: 598–605.

Zhou B, Zhao H, Puig X, et al. Semantic understanding of scenes through the ade20k dataset[J]. International Journal of Computer Vision, 2019, 127(3): 302–321.

Everingham M, Van Gool L, Williams C K I, et al. The pascal visual object classes (voc) challenge[J]. International journal of computer vision, 2010, 88(2): 303–338.

Lin T Y, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//European conference on computer vision. Springer, Cham, 2014: 740–755.

Downloads

Published

2022-03-22

How to Cite

Ruan, X. ., Guo, P. ., & Huang, J. . (2022). A Semantic OctoMap Mapping Method Based on CBAM-PSPNet. Journal of Web Engineering, 21(03), 879–910. https://doi.org/10.13052/jwe1540-9589.21315

Issue

Section

Articles