Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2024, Vol. 64 Issue (1) : 44-54     DOI: 10.16511/j.cnki.qhdxxb.2023.26.045
VEHICLE AND TRAFFIC ENGINEERING |
Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network
ZHANG Mingfang1, LI Guilin1, WU Chuna2, WANG Li1, TONG Lianghao1
1. Beijing Key Laboratory of Urban Road Intelligent Traffic Control Technology, North China University of Technology, Beijing 100144, China;
2. Key Laboratory of Operation Safety Technology on Transport Vehicles, Research Institute of Highway, Ministry of Transport, Beijing 100088, China
Download: PDF(8115 KB)   HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  [Objective] The real-time monitoring of a driver's gaze region is essential for human-machine shared driving vehicles to understand and predict the driver's intentions. Because of the limited computational resources and storage capacity of in-vehicle platforms, existing gaze region estimation algorithms often hardly balance accuracy and real-time performance and ignore temporal information.[Methods] Therefore, this paper proposes a lightweight spatial feature encoding network (LSFENet) for driver gaze region estimation. First, the image sequence of the driver's upper body is captured by an RGB camera. Image preprocessing steps, including face alignment and glasses removal, are performed to obtain left- and right-eye images and facial keypoint coordinates to handle challenges such as cluttered backgrounds and facial occlusions in the captured images. Face alignment is conducted using the multi-task cascaded convolutional network algorithm, and the glasses are removed using the cycle-consistent adversarial network algorithm. Second, we build the LSFENet feature extraction network based on the GCSbottleneck module to improve the MobileNetV2 architecture, since the inverted residual structure in the MobileNetV2 network requires a significant amount of memory and floating-point operations and ignores the redundancy and the correlation among the feature maps. We embed a ghost module to improve memory consumption and integrate the channel and spatial attention modules to extract the cross-channel and spatial information from the feature map. Next, the Kronecker product is used to fuse eye features with facial keypoint features to reduce the impact of the information complexity imbalance. Then, the fused features from the images at continuous frames are input into a recurrent neural network to estimate the gaze zone of the image sequence. Finally, the proposed network is evaluated using the public driver gaze in the wild (DGW) dataset and a self-collected dataset. The evaluation metrics include the number of parameters, the floating-point operations per second (FLOPs), the frames per second (FPS), and the F1 score.[Results] The experimental results showed the following:(1) The gaze region estimation accuracy of the proposed algorithm was 97.08%, which was approximately 7% higher than that of the original MobileNetV2. Additionally, both the number of parameters and FLOPs were reduced by 22.5%, and the FPS was improved by 36.43%. The proposed network had a frame rate of approximately 103 FPS and satisfied the computational efficiency and accuracy requirements under in-vehicle environments. (2) The estimation accuracies of the gaze regions 1, 2, 3, 4, and 9 were over 85% for the proposed algorithm. The macro-average and micro-average precisions of the DGW dataset reached 74.32% and 76.01%, respectively. (3) The proposed algorithm provided high classification accuracy for fine-grained eye images with small intra-class differences. (4) The visualization results of the class activation mapping demonstrated that the proposed algorithm had strong adaptability to various lighting conditions and glass occlusion situations.[Conclusions] The research results are of great significance for the recognition of a driver's visual distraction states.
Keywords gaze zone estimation      lightweight spatial feature encoding network      attention mechanism      feature extraction      Kronecker's product      recurrent neural network     
Issue Date: 30 November 2023
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
ZHANG Mingfang
LI Guilin
WU Chuna
WANG Li
TONG Lianghao
Cite this article:   
ZHANG Mingfang,LI Guilin,WU Chuna, et al. Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network[J]. Journal of Tsinghua University(Science and Technology), 2024, 64(1): 44-54.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2023.26.045     OR     http://jst.tsinghuajournals.com/EN/Y2024/V64/I1/44
  
  
  
  
  
  
  
  
  
  
  
  
  
[1]王庭晗, 罗禹贡, 刘金鑫, 等. 基于考虑状态分布的深度确定性策略梯度算法的端到端自动驾驶策略[J]. 清华大学学报(自然科学版), 2021, 61(9):881-888. WANG T H, LUO Y G, LIU J X, et al. End-to-end self-driving policy based on the deep deterministic policy gradient algorithm considering the state distribution[J]. Journal of Tsinghua University (Science and Technology), 2021, 61(9):881-888. (in Chinese)
[2]宗长富, 代昌华, 张东. 智能汽车的人机共驾技术研究现状和发展趋势[J]. 中国公路学报, 2021, 34(6):214-237. ZONG C F, DAI C H, ZHANG D. Human-machine interaction technology of intelligent vehicles:Current development trends and future directions[J]. China Journal of Highway and Transport, 2021, 34(6):214-237. (in Chinese)
[3]CHANG W J, CHEN L B, CHIOU Y Z. Design and implementation of a drowsiness-fatigue-detection system based on wearable smart glasses to increase road safety[J]. IEEE Transactions on Consumer Electronics, 2018, 64(4):461-469.
[4]PLOPSKI A, HIRZLE T, NOROUZI N, et al. The eye in extended reality:A survey on gaze interaction and eye tracking in head-worn extended reality[J]. ACM Computing Surveys, 2023, 55(3):53.
[5]SHI H L, CHEN L F, WANG X Y, et al. A nonintrusive and real-time classification method for driver's gaze region using an RGB camera[J]. Sustainability, 2022, 14(1):508.
[6]YUAN G L, WANG Y F, YAN H Z, et al. Self-calibrated driver gaze estimation via gaze pattern learning[J]. Knowledge-Based Systems, 2022, 235:107630.
[7]刘觅涵, 代欢欢. 基于RGB相机的驾驶员注视区域估计[J]. 现代计算机, 2019, 25(36):69-75. LIU M H, DAI H H. Driver gaze zone estimation based on RGB camera[J]. Modern Computer, 2019, 25(36):69-75. (in Chinese)
[8]LUNDGREN M, HAMMARSTRAND L, MCKELVEY T. Driver-gaze zone estimation using Bayesian filtering and Gaussian processes[J]. IEEE Transactions on Intelligent Transportation Systems, 2016, 17(10):2739-2750.
[9]LU F, SUGANO Y, OKABE T, et al. Adaptive linear regression for appearance-based gaze estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(10):2033-2046.
[10]AUNSRI N, RATTAROM S. Novel eye-based features for head pose-free gaze estimation with web camera:New model and low-cost device[J]. Ain Shams Engineering Journal, 2022, 13(5):101731.
[11]闫秋女, 张伟伟. 基于多模态特征融合的驾驶员注视区域估计[J]. 计算机与数字工程, 2022, 50(10):2217-2222. YAN Q N, ZHANG W W. Estimation of driver's gaze area based on multi-modal feature fusion[J]. Computer and Digital Engineering, 2022, 50(10):2217-2222. (in Chinese)
[12]WANG Y F, YUAN G L, MI Z T, et al. Continuous driver's gaze zone estimation using RGB-D camera[J]. Sensors, 2019, 19(6):1287.
[13]韩坤, 潘海为, 张伟, 等. 基于多模态医学图像的Alzheimer病分类方法[J]. 清华大学学报(自然科学版), 2020, 60(8):664-671, 682. HAN K, PAN H W, ZHANG W, et al. Alzheimer's disease classification method based on multi-modal medical images[J]. Journal of Tsinghua University (Science and Technology), 2020, 60(8):664-671, 682. (in Chinese)
[14]RIBEIRO R F, COSTA P D P. Driver gaze zone dataset with depth data[C]//14th International Conference on Automatic Face & Gesture Recognition. Lille, France:IEEE, 2019:1-5.
[15]GHOSH S, DHALL A, SHARMA G, et al. Speak2Label:Using domain knowledge for creating a large scale driver gaze zone estimation dataset[C]//IEEE/CVF International Conference on Computer Vision Workshops. Montreal, Canada:IEEE, 2021:2896-2905.
[16]SANDLER M, HOWARD A, ZHU M L, et al. MobileNetv2:Inverted residuals and linear bottlenecks[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA:IEEE, 2018:4510-4520.
[17]RANGESH A, ZHANG B W, TRIVEDI M M. Gaze preserving CycleGANs for eyeglass removal and persistent gaze estimation[J]. IEEE Transactions on Intelligent Vehicles, 2022, 7(2):377-386.
[18]YANG Y R, LIU C S, CHANG F L, et al. Driver gaze zone estimation via head pose fusion assisted supervision and eye region weighted encoding[J]. IEEE Transactions on Consumer Electronics, 2021, 67(4):275-284.
[19]KRAFKA K, KHOSLA A, KELLNHOFER P, et al. Eye tracking for everyone[C]//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA:IEEE, 2016:2176-2184.
[20]ASSI L, CHAMSEDDINE F, IBRAHIM P, et al. A global assessment of eye health and quality of life:A systematic review of systematic reviews[J]. JAMA Ophthalmology, 2021, 139(5):526-541.
[21]ZHANG K P, ZHANG Z P, LI Z F, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10):1499-1503.
[22]ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//IEEE International Conference on Computer Vision. Venice, Italy:IEEE, 2017:2242-2251.
[23]NAN Y H, JU J G, HUA Q Y, et al. A-MobileNet:An approach of facial expression recognition[J]. Alexandria Engineering Journal, 2022, 61(6):4435-4444.
[24]HAN K, WANG Y H, TIAN Q, et al. GhostNet:More features from cheap operations[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA:IEEE, 2020:1577-1586.
[25]WOO S, PARK J, LEE J Y, et al. CBAM:Convolutional block attention module[C]//15th European Conference on Computer Vision. Munich, Germany:Springer, 2018:3-19.
[26]HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C]//IEEE/CVF International Conference on Computer Vision. Seoul, Republic of Korea:IEEE, 2019:1314-1324.
[1] ZHANG Xueqin, LIU Gang, WANG Zhineng, LUO Fei, WU Jianhua. Microscopic diffusion prediction based on multifeature fusion and deep learning[J]. Journal of Tsinghua University(Science and Technology), 2024, 64(4): 688-699.
[2] ZHAO Xingwang, HOU Zhedong, YAO Kaixuan, LIANG Jiye. Two-stage fusion multiview graph clustering based on the attention mechanism[J]. Journal of Tsinghua University(Science and Technology), 2024, 64(1): 1-12.
[3] ZHANG Yang, JIANG Minghu. Authorship identification method based on the embedding of the syntax tree node[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(9): 1390-1398.
[4] HUANG Ben, KANG Fei, TANG Yu. A real-time detection method for concrete dam cracks based on an object detection algorithm[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(7): 1078-1086.
[5] ZHOU Xun, LI Yonglong, ZHOU Yingyue, WANG Haoran, LI Jiayang, ZHAO Jiaqi. Dam surface crack detection method based on improved DeepLabV3+ network[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(7): 1153-1163.
[6] YANG Hongyu, ZHANG Zixin, ZHANG Liang. Network security situation assessments with parallel feature extraction and an improved BiGRU[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(5): 842-848.
[7] SUN Yue, HE Ke, ZHANG Zhinan. Multi-source information fitting regression integrated model of coefficient of friction[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(12): 1980-1988.
[8] ZHANG Tianyi, ZHU Zhiming, ZHU Chuanhui, SUN Bowen. Visual sensing image processing and feature information extraction for arc welding[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(1): 156-162.
[9] LI Mingyang, KONG Fang. Combined self-attention mechanism for named entity recognition in social media[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(6): 461-467.
[10] ZHANG Yu, ZHANG Pengyuan, YAN Yonghong. Long short-term memory with attention and multitask learning for distant speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 249-253.
[11] LU Xiaofeng, ZHANG Shengfei, YI Shengwei. Free-text keystroke continuous authentication using CNN and RNN[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(12): 1072-1078.
[12] XIAO Xi, ZHOU Lu. Speech recognition adaptive clustering feature extraction algorithms based on the k-means algorithm and the normalized intra-class variance[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(8): 857-861.
[13] YANG Shan, FAN Bo, XIE Lei, WANG Lijuan, SONG Geping. Speech-driven video-realistic talking head synthesis using BLSTM-RNN[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 250-256.
[14] JIAO Zhihao, YANG Jian, YE Chunmao, SONG Jianshe. Scattering component consistency based parameter for polarimetric SAR image classification[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(8): 908-912.
[15] HAN Zandong, LI Yongjie, LI Xiaoyang. Simulation and feature extraction of eddy current tests for residual austenite content[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(6): 617-621.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd