基于深度神经网络的个体阅读眼动预测

doi:10.16511/j.cnki.qhdxxb.2019.26.001

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1819 KB)
输出: BibTeX | EndNote (RIS)

摘要传统眼动模型基于心理学假设和经验数据构建，不能对未见文本数据进行预测，且不能解决阅读个体化差异问题。针对这一问题，该文提出了一种利用深度神经网络预测读者注视点的眼动模型。与传统基于心理学的眼动模型不同，该模型不是基于经验数据集，而是基于双向长短期记忆-条件随机场（bi-directional long short-term memory-conditional random field，bi-LSTM-CRF）神经网络。该模型使用阅读过程中读者的眼球运动数据作为训练数据，来预测该读者阅读其他文本时的注视点。计算机模拟结果表明：bi-LSTM-CRF模型能够使用较少的数据特征获得与现有机器学习模型相似的预测准确度，这使所提出的模型在实时人机交互应用领域具有吸引力。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	王晓明
	赵歆波

关键词 ：个体阅读, 眼动追踪, 眼动模型, 深度神经网络

Abstract：Traditional eye movement models are based on psychological assumptions and empirical data; thus, they cannot predict eye movement for previously unseen text and cannot predict individual differences while reading. This paper presents an eye movement model based on conventional psychology-based eye movement models using a bi-directional long short-term memory-conditional random field (bi-LSTM-CRF) neural network instead of empirical data sets. The model was trained to predict the eye movements of a user reading a previously unseen text based on the eye movements recorded for this person reading other texts as training data. Tests demonstrate that the model can achieve similar prediction accuracy than current machine learning models while requiring fewer features, which makes this model attractive for a range of real-time human-computer applications.

Key words： individual reading eye tracking eye movement model deep neural networks

收稿日期: 2018-09-10 出版日期: 2019-06-01

基金资助:国家自然科学基金资助项目（61231016，61871326）；教育部人文社会科学研究一般项目（18YJCZH180）

通讯作者: 赵歆波,教授,E-mail:xbozhao@nwpu.edu.cn E-mail: xbozhao@nwpu.edu.cn

引用本文:

王晓明, 赵歆波. 基于深度神经网络的个体阅读眼动预测[J]. 清华大学学报（自然科学版）, 2019, 59(6): 468-475.
WANG Xiaoming, ZHAO Xinbo. Eye movement prediction of individuals while reading based on deep neural networks. Journal of Tsinghua University(Science and Technology), 2019, 59(6): 468-475.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2019.26.001 或 http://jst.tsinghuajournals.com/CN/Y2019/V59/I6/468

图１成人读者阅读时的眼睛运动轨迹

图２ LSTM 网络中的重复单元

图３基于深度神经网络的眼动模型架构

图４ biＧLSTMＧCRF模型训练算法过程

表１实验的超参数

图５每层的参数数量和所有需要训练的超参数

表２测试数据中注视词的基线率

表３测试数据的注视词预测准确率

图６使用测试数据的不同特征进行注视点预测准确度比较

表４ EＧZReader, NN０９, HMKA１２和本文的模型之间的比较

[1] 孟红霞,白学军,闫国利,等.词边界信息对读者阅读歧义短语时眼跳策略的影响[J].心理科学,2015,38(4):770-776.MENG H X, BAI X J,YAN G L, et al. The effect of word boundary information on the saccade strategy upon reading the spatially ambiguous words[J].Journal of Psychological Science,2015,38(4):770-776. (in Chinese)
[2] FISHMAN G A, BIRCH D G, HOLDER G E, et al. Electrophysiologic testing in disorders of the retina, optic nerve, and visual pathway[M]. 2nd ed. San Francisco:The Foundation of the American Academy of Ophthalmology, 2001.
[3] RAYNER K. Eye movements in reading and information processing:20 years of research[J]. Psychological Bulletin, 1998, 124(3):372-422.
[4] RADACH R, MCCONKIE G W. Determinants of fixation positions in words during reading[M]//UNDERWOOD G. Eye guidance in reading and scene perception. Oxford, England:Elsevier Science Ltd., 1998:77-100.
[5] CLIFTON JR C, FERREIRA F, HENDERSON J M, et al. Eye movements in reading and information processing:Keith Rayner's 40 year legacy[J]. Journal of Memory and Language, 2016, 86:1-19.
[6] FRISSON S, HARVEY D R, STAUB A. No prediction error cost in reading:Evidence from eye movements[J]. Journal of Memory and Language, 2017, 95:200-214.
[7] KUPERBERG G R, JAEGER T F. What do we mean by prediction in language comprehension?[J]. Language, Cognition and Neuroscience, 2016, 31(1):32-59.
[8] LUKE S G, CHRISTIANSON K. Limits on lexical prediction during reading[J]. Cognitive Psychology, 2016, 88:22-60.
[9] REICHLE E D. Computational models of reading:A primer[J]. Language and Linguistics Compass, 2015, 9(7):271-284.
[10] SLATTERY T J, YATES M. Word skipping:Effects of word length, predictability, spelling and reading skill[J]. The Quarterly Journal of Experimental Psychology, 2017. DOI:10.1080/17470218.2017.1310264.
[11] 苏衡, 刘志方, 曹立人. 中文阅读预视加工中的词频和预测性效应及其对词切分的启示:基于眼动的证据[J]. 心理学报, 2016, 48(6):625-636.SU H, LIU Z F, CAO L R. The effects of word frequency and word predictability in preview and their implications for word segmentation in Chinese reading:Evidence from eye movements[J]. Acta Psychologica Sinica, 2016, 48(6):625-636. (in Chinese)
[12] REICHLE E D, RAYNER K, POLLATSEK A. The E-Z reader model of eye-movement control in reading:Comparisons to other models[J]. Behavioral and Brain Sciences, 2003, 26(4):445-476.
[13] ENGBERT R, NUTHMANN A, RICHTER E M, et al. SWIFT:A dynamical model of saccade generation during reading[J]. Psychological Review, 2005, 112(4):777-813.
[14] NILSSON M, NIVRE J. Learning where to look:Modeling eye movements in reading[C]//Proceedings of the 13th Conference on Computational Natural Language Learning. Boulder, Colorado:Association for Computational Linguistics, 2009:93-101.
[15] NILSSON M, NIVRE J. Towards a data-driven model of eye movement control in reading[C]//Proceedings of 2010 Workshop on Cognitive Modeling and Computational Linguistics. Uppsala, Sweden:Association for Computational Linguistics, 2010:63-71.
[16] MATTHIES F, SØ GAARD A. With blinkers on:Robust prediction of eye movements across readers[C]//Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA:Association for Computational Linguistics, 2013:803-807.
[17] LANDWEHR N, ARZT S, SCHEFFER T, et al. A model of individual differences in gaze control during reading[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar:Association for Computational Linguistics,2014:1810-1815.
[18] HARA T, MOCHIHASHI D, KANO Y, et al. Predicting word fixations in text with a CRF model for capturing general reading strategies among readers[C]//Proceedings of the 1st Workshop on Eye-Tracking and Natural Language Processing. Mumbai, India:The COLING 2012 Organizing Committee, 2012:55-70.
[19] MOCH B N, KOMARUDIN K, SUSILO M S. Development of eye fixation points prediction model from eye tracking data using neural network[J]. International Journal of Technology, 2017, 8(6):1082-1088.
[20] HOU Y, LI Z, WANG P, et al. Skeleton optical spectra-based action recognition using convolutional neural networks[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(3):807-811.
[21] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12:2493-2537.
[22] GOLDBERG Y. A primer on neural network models for natural language processing[J]. Journal of Artificial Intelligence Research, 2016, 57:345-420.
[23] DAT N D, DAT N D, TRAN V T N, et al. Fuzzy C-means for english sentiment classification in a distributed system[J]. Applied Intelligence, 2017, 46(3):717-738.
[24] HUANG M L, QIAN Q, ZHU X Y. Encoding syntactic knowledge in neural networks for sentiment classification[J]. ACM Transactions on Information Systems (TOIS), 2017, 35(3):26-33.
[25] 张宇,张鹏远,颜永红. 基于注意力LSTM和多任务学习的远场语音识别[J]. 清华大学学报(自然科学版), 2018,58(3),249-253. ZHANG Y, ZHANG P Y, YAN Y H. Long short-term memory with attention and multitask learning for distant speech recognition[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3), 249-253. (in Chinese)
[26] 张雪英,牛溥华,高帆.基于DNN-LSTM的VAD算法[J]. 清华大学学报(自然科学版), 2018,58(5):509-515.ZHANG X Y, NIU P H, GAO F. DNN-LSTM based VAD algorithm[J]. Journal of Tsinghua University (Science and Technology), 2018, 58(5):509-515. (in Chinese)
[27] DYER C, BALLESTEROS M, LING W, et al. Transition-based dependency parsing with stack long short-term memory[C]//Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China:Association for Computational Linguistics, 2015:321-332.
[28] GREFF K, SRIVASTAVA R K, KOUTNíK J, et al. LSTM:A search space odyssey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 28(10):2222-2232.
[29] HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging[J/OL]. (2015-08-09)[2018-09-10]. https://arxiv.org/abs/1508.01991v1.
[30] LUKE S G, CHRISTIANSON K. The Provo Corpus:A large eye-tracking corpus with predictability norms[J]. Behavior Research Methods, 2018, 50(2):826-833.
[31] KENNEDY A, PYNTE J, MURRAY W S, et al. Frequency and predictability effects in the Dundee Corpus:An eye movement analysis[J]. Quarterly Journal of Experimental Psychology, 2013, 66(3):601-618.
[32] YU A W, LEE H, LE Q V. Learning to skim text[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Ancouver, Canada:Association for Computational Linguistics, 2017:1880-1890.
[33] PASCANU R, MIKOLOV T, BENGIO Y. On the difficulty of training recurrent neural networks[C]//Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, USA:JMLR.org, 2012:Ⅲ-1310-Ⅲ-1318.
[34] ZEILER M D. ADADELTA:An adaptive learning rate method[J/OL]. (2012-12-22)[2018-09-10] http://cn.arxiv.org/abs/1212.5701.
[35] KINGMA D P, BA J. Adam:A method for stochastic optimization[J/OL]. (2014-12-22)[2018-09-10]. https://arxiv.org/abs/1412.6980.
[36] DAUPHIN Y N, DE VRIES H, BENGIO Y. Equilibrated adaptive learning rates for non-convex optimization[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada:MIT Press, 2015:1504-1512.

[1]	付汉良, 谭玉冰, 夏中境, 郭晓彤. 专家危险识别轨迹对建筑工人安全教育的影响——来自眼动实验的证据[J]. 清华大学学报（自然科学版）, 2024, 64(2): 205-213.
[2]	王文广, 陈运文, 蔡华, 曾彦能, 杨慧宇. 基于混合深度神经网络模型的司法文书智能化处理[J]. 清华大学学报（自然科学版）, 2019, 59(7): 505-511.
[3]	张雪英, 牛溥华, 高帆. 基于DNN-LSTM的VAD算法[J]. 清华大学学报（自然科学版）, 2018, 58(5): 509-515.
[4]	艾斯卡尔·肉孜, 殷实, 张之勇, 王东, 艾斯卡尔·艾木都拉, 郑方. THUYG-20:免费的维吾尔语语音数据库[J]. 清华大学学报（自然科学版）, 2017, 57(2): 182-187.
[5]	田垚, 蔡猛, 何亮, 刘加. 基于深度神经网络和Bottleneck特征的说话人识别系统[J]. 清华大学学报（自然科学版）, 2016, 56(11): 1143-1148.
[6]	张劲松, 高迎明, 解焱陆. 基于DNN的发音偏误趋势检测[J]. 清华大学学报（自然科学版）, 2016, 56(11): 1220-1225.

Viewed

Full text

Abstract

Cited

Shared

Discussed