Abstract:Robots need to identify the type of object, detect the shape and judge the distance to the object. This paper presents an object recognition and localization method that uses binocular information based on the Mask R-CNN model. The Mask R-CNN is used to process the binocular image and complete the bounding box selection, recognition and shape segmentation for each image. Then, the neural network feature is used to match the same object in the binocular images. Finally, the iterative closest point (ICP) method is used to estimate the parallax and calculate the distance according to the obtained object shape. Tests show that the method can process data in near real-time speed with better precision than the traditional disparity map algorithm.
彭秋辰, 宋亦旭. 基于Mask R-CNN的物体识别和定位[J]. 清华大学学报(自然科学版), 2019, 59(2): 135-141.
PENG Qiuchen, SONG Yixu. Object recognition and localization based on Mask R-CNN. Journal of Tsinghua University(Science and Technology), 2019, 59(2): 135-141.
[1] YANG R J, WANG F, QIN H. Research of pedestrian detection and location system based on stereo images[J]. Application Research of Computers, 2018, 35(5):1591-1600. [2] REDMON J, FARHADI A. YOLO9000:Better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017:6517-6525. [3] HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Venice, Italy, 2017:2980-2988. [4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, USA, 2014:580-587. [5] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, 2015:1440-1448. [6] REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6):1137-1149. [7] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017:936-944. [8] SHELHAMER E, LONG J, DARRELL T, et al. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):640-651. [9] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA, 2016:770-778. [10] XIE S, GIRSHICK R, DOLLAR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017:5987-5995. [11] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO:Common objects in context[C]//Proceedings of the European Conference on Computer Vision. Zurich, Switzerland, 2014:740-755. [12] GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics:The KITTI dataset[J]. The International Journal of Robotics Research, 2013, 32(11):1231-1237.