Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2024, Vol. 64 Issue (1): 44-54    DOI: 10.16511/j.cnki.qhdxxb.2023.26.045
  车辆与交通 本期目录 | 过刊浏览 | 高级检索 |
基于轻量型空间特征编码网络的驾驶人注视区域估计算法
张名芳1, 李桂林1, 吴初娜2, 王力1, 佟良昊1
1. 北方工业大学 城市道路交通智能控制技术北京市重点实验室, 北京 100144;
2. 交通运输部 公路科学研究院, 运输车辆运行安全技术交通运输行业重点实验室, 北京 100088
Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network
ZHANG Mingfang1, LI Guilin1, WU Chuna2, WANG Li1, TONG Lianghao1
1. Beijing Key Laboratory of Urban Road Intelligent Traffic Control Technology, North China University of Technology, Beijing 100144, China;
2. Key Laboratory of Operation Safety Technology on Transport Vehicles, Research Institute of Highway, Ministry of Transport, Beijing 100088, China
全文: PDF(8115 KB)   HTML 
输出: BibTeX | EndNote (RIS)      
摘要 实时监测驾驶人注视区有助于人机共驾汽车理解并判断驾驶人的意图。针对车载环境下算法精度和实时性难以平衡的问题,提出了一种基于轻量型空间特征编码网络(lightweight spatial feature encoding network,LSFENet)的驾驶人注视区估计算法。通过人脸对齐和眼镜移除步骤对采集的驾驶人上半身图像序列进行预处理,得到左右眼图像和人脸关键点坐标;在MobileNetV2的基础上构建基于GCSbottleneck模块的LSFENet特征提取网络,集成注意力机制模块增强关键特征权重,生成左右两眼特征;利用Kronecker积融合眼部与人脸关键点特征,将连续帧图像融合后的特征输入循环神经网络中,得到该图像序列的注视区域估计结果;利用公开数据集和自制数据集对新算法进行测试。实验结果表明: LSFENet算法的注视区估计准确率可达97.08%,每秒能检测约103帧图像,满足车载环境下运算效率和精度需求;LSFENet算法对注视区1、2、3、4、9的估计准确率均在85%以上,且对不同光照条件和眼镜遮挡情况均具有较强的适应能力。研究结果对驾驶人视觉分心状态识别具有重要意义。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张名芳
李桂林
吴初娜
王力
佟良昊
关键词 注视区域估计轻量型空间特征编码网络注意力机制特征提取Kronecker积循环神经网络    
Abstract:[Objective] The real-time monitoring of a driver's gaze region is essential for human-machine shared driving vehicles to understand and predict the driver's intentions. Because of the limited computational resources and storage capacity of in-vehicle platforms, existing gaze region estimation algorithms often hardly balance accuracy and real-time performance and ignore temporal information.[Methods] Therefore, this paper proposes a lightweight spatial feature encoding network (LSFENet) for driver gaze region estimation. First, the image sequence of the driver's upper body is captured by an RGB camera. Image preprocessing steps, including face alignment and glasses removal, are performed to obtain left- and right-eye images and facial keypoint coordinates to handle challenges such as cluttered backgrounds and facial occlusions in the captured images. Face alignment is conducted using the multi-task cascaded convolutional network algorithm, and the glasses are removed using the cycle-consistent adversarial network algorithm. Second, we build the LSFENet feature extraction network based on the GCSbottleneck module to improve the MobileNetV2 architecture, since the inverted residual structure in the MobileNetV2 network requires a significant amount of memory and floating-point operations and ignores the redundancy and the correlation among the feature maps. We embed a ghost module to improve memory consumption and integrate the channel and spatial attention modules to extract the cross-channel and spatial information from the feature map. Next, the Kronecker product is used to fuse eye features with facial keypoint features to reduce the impact of the information complexity imbalance. Then, the fused features from the images at continuous frames are input into a recurrent neural network to estimate the gaze zone of the image sequence. Finally, the proposed network is evaluated using the public driver gaze in the wild (DGW) dataset and a self-collected dataset. The evaluation metrics include the number of parameters, the floating-point operations per second (FLOPs), the frames per second (FPS), and the F1 score.[Results] The experimental results showed the following:(1) The gaze region estimation accuracy of the proposed algorithm was 97.08%, which was approximately 7% higher than that of the original MobileNetV2. Additionally, both the number of parameters and FLOPs were reduced by 22.5%, and the FPS was improved by 36.43%. The proposed network had a frame rate of approximately 103 FPS and satisfied the computational efficiency and accuracy requirements under in-vehicle environments. (2) The estimation accuracies of the gaze regions 1, 2, 3, 4, and 9 were over 85% for the proposed algorithm. The macro-average and micro-average precisions of the DGW dataset reached 74.32% and 76.01%, respectively. (3) The proposed algorithm provided high classification accuracy for fine-grained eye images with small intra-class differences. (4) The visualization results of the class activation mapping demonstrated that the proposed algorithm had strong adaptability to various lighting conditions and glass occlusion situations.[Conclusions] The research results are of great significance for the recognition of a driver's visual distraction states.
Key wordsgaze zone estimation    lightweight spatial feature encoding network    attention mechanism    feature extraction    Kronecker's product    recurrent neural network
收稿日期: 2023-03-03      出版日期: 2023-11-30
基金资助:国家自然科学基金资助项目(51905007);北京市教育委员会科学研究计划项目(KM202210009013)
作者简介: 张名芳(1989—),女,副教授。E-mail:mingfang@ncut.edu.cn
引用本文:   
张名芳, 李桂林, 吴初娜, 王力, 佟良昊. 基于轻量型空间特征编码网络的驾驶人注视区域估计算法[J]. 清华大学学报(自然科学版), 2024, 64(1): 44-54.
ZHANG Mingfang, LI Guilin, WU Chuna, WANG Li, TONG Lianghao. Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network. Journal of Tsinghua University(Science and Technology), 2024, 64(1): 44-54.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2023.26.045  或          http://jst.tsinghuajournals.com/CN/Y2024/V64/I1/44
  
  
  
  
  
  
  
  
  
  
  
  
  
[1]王庭晗, 罗禹贡, 刘金鑫, 等. 基于考虑状态分布的深度确定性策略梯度算法的端到端自动驾驶策略[J]. 清华大学学报(自然科学版), 2021, 61(9):881-888. WANG T H, LUO Y G, LIU J X, et al. End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(9):881-888. (in Chinese)
[2]宗长富, 代昌华, 张东. 智能汽车的人机共驾技术研究现状和发展趋势[J]. 中国公路学报, 2021, 34(6):214-237. ZONG C F, DAI C H, ZHANG D. Human-machine interaction technology of intelligent vehicles:Current development trends and future directions[J]. China Journal of Highway and Transport, 2021, 34(6):214-237. (in Chinese)
[3]CHANG W J, CHEN L B, CHIOU Y Z. Design and implementation of a drowsiness-fatigue-detection system based on wearable smart glasses to increase road safety[J]. IEEE Transactions on Consumer Electronics, 2018, 64(4):461-469.
[4]PLOPSKI A, HIRZLE T, NOROUZI N, et al. The eye in extended reality:A survey on gaze interaction and eye tracking in head-worn extended reality[J]. ACM Computing Surveys, 2023, 55(3):53.
[5]SHI H L, CHEN L F, WANG X Y, et al. A nonintrusive and real-time classification method for driver's gaze region using an RGB camera[J]. Sustainability, 2022, 14(1):508.
[6]YUAN G L, WANG Y F, YAN H Z, et al. Self-calibrated driver gaze estimation via gaze pattern learning[J]. Knowledge-Based Systems, 2022, 235:107630.
[7]刘觅涵, 代欢欢. 基于RGB相机的驾驶员注视区域估计[J]. 现代计算机, 2019, 25(36):69-75. LIU M H, DAI H H. Driver gaze zone estimation based on RGB camera[J]. Modern Computer, 2019, 25(36):69-75. (in Chinese)
[8]LUNDGREN M, HAMMARSTRAND L, MCKELVEY T. Driver-gaze zone estimation using Bayesian filtering and Gaussian processes[J]. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(10):2739-2750.
[9]LU F, SUGANO Y, OKABE T, et al. Adaptive linear regression for appearance-based gaze estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(10):2033-2046.
[10]AUNSRI N, RATTAROM S. Novel eye-based features for head pose-free gaze estimation with web camera:New model and low-cost device[J]. Ain Shams Engineering Journal, 2022, 13(5):101731.
[11]闫秋女, 张伟伟. 基于多模态特征融合的驾驶员注视区域估计[J]. 计算机与数字工程, 2022, 50(10):2217-2222. YAN Q N, ZHANG W W. Estimation of driver's gaze area based on multi-modal feature fusion[J]. Computer and Digital Engineering, 2022, 50(10):2217-2222. (in Chinese)
[12]WANG Y F, YUAN G L, MI Z T, et al. Continuous driver's gaze zone estimation using RGB-D camera[J]. Sensors, 2019, 19(6):1287.
[13]韩坤, 潘海为, 张伟, 等. 基于多模态医学图像的Alzheimer病分类方法[J]. 清华大学学报(自然科学版), 2020, 60(8):664-671, 682. HAN K, PAN H W, ZHANG W, et al. Alzheimer's disease classification method based on multi-modal medical images[J]. Journal of Tsinghua University (Science and Technology), 2020, 60(8):664-671, 682. (in Chinese)
[14]RIBEIRO R F, COSTA P D P. Driver gaze zone dataset with depth data[C]//14th International Conference on Automatic Face & Gesture Recognition. Lille, France:IEEE, 2019:1-5.
[15]GHOSH S, DHALL A, SHARMA G, et al. Speak2Label:Using domain knowledge for creating a large scale driver gaze zone estimation dataset[C]//IEEE/CVF International Conference on Computer Vision Workshops. Montreal, Canada:IEEE, 2021:2896-2905.
[16]SANDLER M, HOWARD A, ZHU M L, et al. MobileNetv2:Inverted residuals and linear bottlenecks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA:IEEE, 2018:4510-4520.
[17]RANGESH A, ZHANG B W, TRIVEDI M M. Gaze preserving CycleGANs for eyeglass removal and persistent gaze estimation[J]. IEEE Transactions on Intelligent Vehicles, 2022, 7(2):377-386.
[18]YANG Y R, LIU C S, CHANG F L, et al. Driver gaze zone estimation via head pose fusion assisted supervision and eye region weighted encoding[J]. IEEE Transactions on Consumer Electronics, 2021, 67(4):275-284.
[19]KRAFKA K, KHOSLA A, KELLNHOFER P, et al. Eye tracking for everyone[C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA:IEEE, 2016:2176-2184.
[20]ASSI L, CHAMSEDDINE F, IBRAHIM P, et al. A global assessment of eye health and quality of life:A systematic review of systematic reviews[J]. JAMA Ophthalmology, 2021, 139(5):526-541.
[21]ZHANG K P, ZHANG Z P, LI Z F, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10):1499-1503.
[22]ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//IEEE International Conference on Computer Vision. Venice, Italy:IEEE, 2017:2242-2251.
[23]NAN Y H, JU J G, HUA Q Y, et al. A-MobileNet:An approach of facial expression recognition[J]. Alexandria Engineering Journal, 2022, 61(6):4435-4444.
[24]HAN K, WANG Y H, TIAN Q, et al. GhostNet:More features from cheap operations[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA:IEEE, 2020:1577-1586.
[25]WOO S, PARK J, LEE J Y, et al. CBAM:Convolutional block attention module[C]//15th European Conference on Computer Vision. Munich, Germany:Springer, 2018:3-19.
[26]HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea:IEEE, 2019:1314-1324.
[1] 张雪芹, 刘岗, 王智能, 罗飞, 吴建华. 基于多特征融合和深度学习的微观扩散预测[J]. 清华大学学报(自然科学版), 2024, 64(4): 688-699.
[2] 赵兴旺, 侯哲栋, 姚凯旋, 梁吉业. 基于注意力机制的两阶段融合多视图图聚类[J]. 清华大学学报(自然科学版), 2024, 64(1): 1-12.
[3] 张洋, 江铭虎. 基于句法树节点嵌入的作者识别方法[J]. 清华大学学报(自然科学版), 2023, 63(9): 1390-1398.
[4] 黄贲, 康飞, 唐玉. 基于目标检测的混凝土坝裂缝实时检测方法[J]. 清华大学学报(自然科学版), 2023, 63(7): 1078-1086.
[5] 周迅, 李永龙, 周颖玥, 王皓冉, 李佳阳, 赵家琦. 基于改进DeepLabV3+网络的坝面裂缝检测方法[J]. 清华大学学报(自然科学版), 2023, 63(7): 1153-1163.
[6] 逯波, 段晓东, 袁野. 面向跨模态检索的自监督深度语义保持Hash[J]. 清华大学学报(自然科学版), 2022, 62(9): 1442-1449.
[7] 杨宏宇, 张梓锌, 张良. 基于并行特征提取和改进BiGRU的网络安全态势评估[J]. 清华大学学报(自然科学版), 2022, 62(5): 842-848.
[8] 孙悦, 何可, 张执南. 多源信息拟合摩擦系数的回归集成模型[J]. 清华大学学报(自然科学版), 2022, 62(12): 1980-1988.
[9] 张天一, 朱志明, 朱传辉, 孙博文. 用于弧焊过程的视觉传感图像处理及特征信息提取方法[J]. 清华大学学报(自然科学版), 2022, 62(1): 156-162.
[10] 李明扬, 孔芳. 融入自注意力机制的社交媒体命名实体识别[J]. 清华大学学报(自然科学版), 2019, 59(6): 461-467.
[11] 张宇, 张鹏远, 颜永红. 基于注意力LSTM和多任务学习的远场语音识别[J]. 清华大学学报(自然科学版), 2018, 58(3): 249-253.
[12] 芦效峰, 张胜飞, 伊胜伟. 基于CNN和RNN的自由文本击键模式持续身份认证[J]. 清华大学学报(自然科学版), 2018, 58(12): 1072-1078.
[13] 肖熙, 周路. 基于k均值和基于归一化类内方差的语音识别自适应聚类特征提取算法[J]. 清华大学学报(自然科学版), 2017, 57(8): 857-861.
[14] 焦智灏, 杨健, 叶春茂, 宋建社. 基于散射成分一致性参数的极化SAR图像分类[J]. 清华大学学报(自然科学版), 2016, 56(8): 908-912.
[15] 韩赞东, 李永杰, 李晓阳. 残余奥氏体含量涡流检测仿真与特征提取[J]. 清华大学学报(自然科学版), 2016, 56(6): 617-621.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn