Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2019, Vol. 59 Issue (2): 135-141    DOI: 10.16511/j.cnki.qhdxxb.2019.22.003
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
基于Mask R-CNN的物体识别和定位
彭秋辰, 宋亦旭
清华大学 计算机科学与技术系, 智能技术与系统国家重点实验室, 北京 100084
Object recognition and localization based on Mask R-CNN
PENG Qiuchen, SONG Yixu
State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
全文: PDF(5930 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 为了让机器人能识别物体类别、探测物体形状、判断物体距离,提出一种基于Mask R-CNN模型的双目视觉的物体识别和定位方法。该方法利用Mask R-CNN处理双目图像,对每张图像进行物体识别和形状分割,然后利用神经网络特征对双目图像中的相同目标进行匹配。以物体形状为依据,使用最近点搜索算法估计视差并计算距离。实验结果表明,该方法能够以准实时的速度进行物体的识别和定位,与传统的依赖计算全局视差图的方法相比,在速度和精度上都有提高。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
彭秋辰
宋亦旭
关键词 机器人导航Mask R-CNN特征匹配物体识别双目视觉    
Abstract:Robots need to identify the type of object, detect the shape and judge the distance to the object. This paper presents an object recognition and localization method that uses binocular information based on the Mask R-CNN model. The Mask R-CNN is used to process the binocular image and complete the bounding box selection, recognition and shape segmentation for each image. Then, the neural network feature is used to match the same object in the binocular images. Finally, the iterative closest point (ICP) method is used to estimate the parallax and calculate the distance according to the obtained object shape. Tests show that the method can process data in near real-time speed with better precision than the traditional disparity map algorithm.
Key wordsrobot navigation    Mask R-CNN    feature matching    object recognition    binocular vision
收稿日期: 2018-08-20      出版日期: 2019-02-16
通讯作者: 宋亦旭,副研究员,E-mail:songyx@mail.tsinghua.edu.cn     E-mail: songyx@mail.tsinghua.edu.cn
引用本文:   
彭秋辰, 宋亦旭. 基于Mask R-CNN的物体识别和定位[J]. 清华大学学报(自然科学版), 2019, 59(2): 135-141.
PENG Qiuchen, SONG Yixu. Object recognition and localization based on Mask R-CNN. Journal of Tsinghua University(Science and Technology), 2019, 59(2): 135-141.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2019.22.003  或          http://jst.tsinghuajournals.com/CN/Y2019/V59/I2/135
  图1 MaskRGCNN结构简化图
  图2 (网络版彩图)Anchor修正过程示意图
  图3 (网络版彩图)FCN 分割网络
  图4 (网络版彩图)MaskRGCNN物体识别分割结果示例
  图5 (网络版彩图)左右目图像物体匹配示例
  图6 双目测距每像素相对误差
  图7 (网络版彩图)各视差算法精度比较图
  图8 (网络版彩图)模型预处理过程示意图
  图9 ROIAlign并行化处理过程示意图
  表1 MaskRGCNN性能评估
  图10 (网络版彩图)近距离物体双目测距结果示例
  图11 (网络版彩图)远距离物体双目测距结果示例
  图12 MaskRGCNN &ICP方法和 YOLOv3 & GC方法的结果对比
  表2 不同方法的近距离物体相对测距误差和用时对比
[1] YANG R J, WANG F, QIN H. Research of pedestrian detection and location system based on stereo images[J]. Application Research of Computers, 2018, 35(5):1591-1600.
[2] REDMON J, FARHADI A. YOLO9000:Better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017:6517-6525.
[3] HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Venice, Italy, 2017:2980-2988.
[4] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, USA, 2014:580-587.
[5] GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, 2015:1440-1448.
[6] REN S Q, HE K M, GIRSHICK R B, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(6):1137-1149.
[7] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017:936-944.
[8] SHELHAMER E, LONG J, DARRELL T, et al. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4):640-651.
[9] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA, 2016:770-778.
[10] XIE S, GIRSHICK R, DOLLAR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA, 2017:5987-5995.
[11] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO:Common objects in context[C]//Proceedings of the European Conference on Computer Vision. Zurich, Switzerland, 2014:740-755.
[12] GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics:The KITTI dataset[J]. The International Journal of Robotics Research, 2013, 32(11):1231-1237.
[1] 阮晓钢, 刘鹏飞, 朱晓庆. 基于气味奖励引导的Q-learning环境认知方法[J]. 清华大学学报(自然科学版), 2021, 61(3): 254-260.
[2] 徐洪平, 刘洋, 易航, 阎小涛, 康健, 张文瑾. 运载火箭测发网络异常流量识别技术[J]. 清华大学学报(自然科学版), 2018, 58(1): 20-26,34.
[3] 郑军, 魏海永. 基于白化变换及曲率特征的3维物体识别及姿态计算[J]. 清华大学学报(自然科学版), 2016, 56(10): 1025-1030.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn