基于知识蒸馏的高效生物医学命名实体识别模型

doi:10.16511/j.cnki.qhdxxb.2020.26.035

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1180 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要在生物医学文献挖掘领域中，已有的BioBert采用预训练的方式在生物医学命名实体识别（BioNER）任务中表现出优秀的实体识别性能，却存在模型规模过大、速度过慢的缺点。针对BioBert网络模型如何高效压缩问题，该文提出了一种高效生物医学名称实体识别模型，命名为FastBioNER模型。该模型采用动态知识蒸馏方式对BioBert模型进行压缩，首先，通过动态权重函数模拟真实的学习行为，即在模型训练过程中动态调整各部分损失函数的重要程度；其次，采用动态知识蒸馏方式将训练后的BioBert作为教师模型，压缩到一个规模较小的学生模型中；最后，对提出的FastBioNER模型在已公开数据集NCBI疾病、BC5CDR-chem和BC4CHEMD上进行了实验验证。实验结果表明：提出的FastBioNER模型在3个数据集中获得除BioBert外最高F1值分别为88.63%、92.82%和92.60%，并分别以损失1.10%、0.86%、0.15%的F1值为代价，将BioBert的模型大小压缩了39.26%，同时推理时间缩短了46.17%。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	胡滨
	耿天玉
	邓赓
	段磊

关键词 ：自然语言处理, 生物医学信息学, 命名实体识别, 知识蒸馏

Abstract：Many biomedical literature mining systems use the pre-training language model BioBert which provides state-of-the-art biomedical named entity recognition after pre-training. However, BioBert is too large scale and slow. This paper presents a faster biomedical named entity recognition model, FastBioNER, that is based on knowledge distillation. FastBioNER compresses the BioBert model using dynamic knowledge distillation. A dynamic weight function is used to simulate the real learning behavior to adjust the importance of the loss function of each part during training. Then, the trained BioBert is compressed into a small student model by dynamic knowledge distillation. The FastBioNER model was validated on three common data sets, NCBI disease, BC5CDR-chem and BC4CHEMD. The tests show that FastBioNER had the highest F1 values after BioBert at 88.63%, 92.82% and 92.60% for the three data sets while reducing the BioBert model size by 39.26% and the inference time by 46.17% at the cost of 1.10%, 0.86% and 0.15% smaller F1.

Key words： natural language processing biomedical informatics named entity recognition knowledge distillation

收稿日期: 2020-08-23 出版日期: 2021-08-21

基金资助:国家自然科学基金资助项目（61906126，61972268，61572332）

通讯作者: 耿天玉,助理研究员,E-mail:tygeng@scu.edu.cn E-mail: tygeng@scu.edu.cn

引用本文:

胡滨, 耿天玉, 邓赓, 段磊. 基于知识蒸馏的高效生物医学命名实体识别模型[J]. 清华大学学报（自然科学版）, 2021, 61(9): 936-942.
HU Bin, GENG Tianyu, DENG Geng, DUAN Lei. Faster biomedical named entity recognition based on knowledge distillation. Journal of Tsinghua University(Science and Technology), 2021, 61(9): 936-942.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2020.26.035 或 http://jst.tsinghuajournals.com/CN/Y2021/V61/I9/936

[1] HANISCH D, FUNDEL K, MEVISSEN H T, et al. ProMiner: Rule-based protein and gene entity recognition [J]. BMC Bioinformatics, 2005, 6(S1): S14.
[2] WANG X, ZHANG Y, LI Q, et al. PENNER: Pattern-enhanced nested named entity recognition in biomedical literature [C]//Proceedings of 2018 IEEE International Conference on Bioinformatics and Biomedicine. Madrid, Spain: IEEE, 2018: 540-547.
[3] LI X, ROTH D. Learning question classifiers: The role of semantic information [J]. Natural Language Engineering, 2006, 12(3): 229-249.
[4] LEE K J, HWANG Y S, KIM S, et al. Biomedical named entity recognition using two-phase model based on SVMs [J]. Journal of Biomedical Informatics, 2004, 37(6): 436-447.
[5] PONOMAREVA N, ROSSO P, PLA F, et al. Conditional random fields vs. hidden Markov models in a biomedical named entity recognition task [C]//Proceedings of 2007 International Conference on Recent Advances in Natural Language Processing. Borovets, Bulgaria: RANLP, 2007: 479-483.
[6] LEAMAN R, WEI C H, ZOU C. Mining chemical patents with an ensemble of open systems [J]. Database: The Journal of Biological Databases and Curation, 2016, 2016: baw065.
[7] KIM Y. Convolutional neural networks for sentence classification [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar: Association for Computational Linguistics, 2014: 1746-1751.
[8] ELMAN J L. Finding structure in time [J]. Cognitive Science, 1990, 14(2): 179-211.
[9] HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780.
[10] LUO L, YANG Z H, YANG P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition [J]. Bioinformatics, 2018, 34(8): 1381-1388.
[11] CHO M, HA J, PARK C, et al. Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition [J]. Journal of Biomedical Informatics, 2020, 103(C): 103381.
[12] WANG X, ZHANG Y, REN X, et al. Cross-type biomedical named entity recognition with deep multi-task learning [J]. Bioinformatics, 2019, 35(10): 1745-1752.
[13] YOON W, SO C H, LEE J, et al. CollaboNet: Collaboration of deep neural networks for biomedical named entity recognition [J]. BMC Bioinformatics, 2019, 20(10): 249.
[14] LEE J, YOON W J, KIM S D, et al. BioBERT: A pre-trained biomedical language representation model for biomedical text mining [J]. Bioinformatics, 2020, 36(4): 1234-1240.
[15] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA: NAACL-HLT, 2019.
[16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc., 2017: 5998-6008.
[17] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Louisiana, USA: NAACL-HLT, 2018: 2227-2237.
[18] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [J]. arXiv preprint arXiv, 2015: 1503.02531.
[19] SUN S, CHENG Y, GAN Z, et al. Patient knowledge distillation for bert model compression [C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Hong Kong, China: EMNLP, 2019: 3-7.
[20] CHEN H T, WANG Y H, XU C, et al. Data-free learning of student networks [C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019: 3513-3521.
[21] JIAO X Q, YIN Y C, SHANG L F, et al. TinyBERT: Distilling BERT for natural language understanding [J]. arXiv preprint arXiv, 2019: 1909.10351.
[22] DO AGǦAN R I, LEAMAN R, LU Z Y. NCBI disease corpus: A resource for disease name recognition and concept normalization [J]. Journal of Biomedical Informatics, 2014, 47: 1-10.
[23] LI J, SUN Y P, JOHNSON R J, et al. BioCreative V CDR task corpus: A resource for chemical disease relation extraction [J]. Database: The Journal of Biological Databases and Curation, 2016: baw068.
[24] KRALLINGER M, RABAL O, LEITNER F, et al. The CHEMDNER corpus of chemicals and drugs and its annotation principles [J]. Journal of Cheminformatics, 2015, 7(S1): S2.
[25] MU X F, WANG W, XU A P. Incorporating token-level dictionary feature into neural model for named entity recognition [J]. Neurocomputing, 2020, 375: 43-50.

[1]	王昀, 胡珉, 塔娜, 孙海涛, 郭毅峰, 周武爱, 郭昱, 张皖哲, 冯建华. 大语言模型及其在政务领域的应用[J]. 清华大学学报（自然科学版）, 2024, 64(4): 649-658.
[2]	王庆人, 王银子, 仲红, 张以文. 面向中文的字词组合序列实体识别方法[J]. 清华大学学报（自然科学版）, 2023, 63(9): 1326-1338.
[3]	陆思聪, 李春文. 基于场景与话题的聊天型人机会话系统[J]. 清华大学学报（自然科学版）, 2022, 62(5): 952-958.
[4]	尹学振, 赵慧, 赵俊保, 姚婉薇, 黄泽林. 多神经网络协作的军事领域命名实体识别[J]. 清华大学学报（自然科学版）, 2020, 60(8): 648-655.
[5]	贾旭东, 王莉. 基于多头注意力胶囊网络的文本分类模型[J]. 清华大学学报（自然科学版）, 2020, 60(5): 415-421.
[6]	陈乐乐, 黄松, 孙金磊, 惠战伟, 吴开舜. 基于BM25算法的问题报告质量检测方法[J]. 清华大学学报（自然科学版）, 2020, 60(10): 829-836.
[7]	李明扬, 孔芳. 融入自注意力机制的社交媒体命名实体识别[J]. 清华大学学报（自然科学版）, 2019, 59(6): 461-467.
[8]	王元龙, 李茹, 张虎, 王智强. 阅读理解中因果关系类选项的研究[J]. 清华大学学报（自然科学版）, 2018, 58(3): 272-278.
[9]	卢兆麟, 李升波, Schroeder Felix, 周吉晨, 成波. 结合自然语言处理与改进层次分析法的乘用车驾驶舒适性评价[J]. 清华大学学报（自然科学版）, 2016, 56(2): 137-143.
[10]	张旭, 王生进. 基于自然语言处理的特定属性物体检测[J]. 清华大学学报（自然科学版）, 2016, 56(11): 1137-1142.

Viewed

Full text

Abstract

Cited

Shared

Discussed