基于知识蒸馏的高效生物医学命名实体识别模型

胡滨, 耿天玉, 邓赓, 段磊

清华大学学报(自然科学版) ›› 2021, Vol. 61 ›› Issue (9) : 936-942.

PDF(1180 KB)
PDF(1180 KB)
清华大学学报(自然科学版) ›› 2021, Vol. 61 ›› Issue (9) : 936-942. DOI: 10.16511/j.cnki.qhdxxb.2020.26.035
数据库

基于知识蒸馏的高效生物医学命名实体识别模型

  • 胡滨, 耿天玉, 邓赓, 段磊
作者信息 +

Faster biomedical named entity recognition based on knowledge distillation

  • HU Bin, GENG Tianyu, DENG Geng, DUAN Lei
Author information +
文章历史 +

摘要

在生物医学文献挖掘领域中,已有的BioBert采用预训练的方式在生物医学命名实体识别(BioNER)任务中表现出优秀的实体识别性能,却存在模型规模过大、速度过慢的缺点。针对BioBert网络模型如何高效压缩问题,该文提出了一种高效生物医学名称实体识别模型,命名为FastBioNER模型。该模型采用动态知识蒸馏方式对BioBert模型进行压缩,首先,通过动态权重函数模拟真实的学习行为,即在模型训练过程中动态调整各部分损失函数的重要程度;其次,采用动态知识蒸馏方式将训练后的BioBert作为教师模型,压缩到一个规模较小的学生模型中;最后,对提出的FastBioNER模型在已公开数据集NCBI疾病、BC5CDR-chem和BC4CHEMD上进行了实验验证。实验结果表明:提出的FastBioNER模型在3个数据集中获得除BioBert外最高F1值分别为88.63%、92.82%和92.60%,并分别以损失1.10%、0.86%、0.15%的F1值为代价,将BioBert的模型大小压缩了39.26%,同时推理时间缩短了46.17%。

Abstract

Many biomedical literature mining systems use the pre-training language model BioBert which provides state-of-the-art biomedical named entity recognition after pre-training. However, BioBert is too large scale and slow. This paper presents a faster biomedical named entity recognition model, FastBioNER, that is based on knowledge distillation. FastBioNER compresses the BioBert model using dynamic knowledge distillation. A dynamic weight function is used to simulate the real learning behavior to adjust the importance of the loss function of each part during training. Then, the trained BioBert is compressed into a small student model by dynamic knowledge distillation. The FastBioNER model was validated on three common data sets, NCBI disease, BC5CDR-chem and BC4CHEMD. The tests show that FastBioNER had the highest F1 values after BioBert at 88.63%, 92.82% and 92.60% for the three data sets while reducing the BioBert model size by 39.26% and the inference time by 46.17% at the cost of 1.10%, 0.86% and 0.15% smaller F1.

关键词

自然语言处理 / 生物医学信息学 / 命名实体识别 / 知识蒸馏

Key words

natural language processing / biomedical informatics / named entity recognition / knowledge distillation

引用本文

导出引用
胡滨, 耿天玉, 邓赓, 段磊. 基于知识蒸馏的高效生物医学命名实体识别模型[J]. 清华大学学报(自然科学版). 2021, 61(9): 936-942 https://doi.org/10.16511/j.cnki.qhdxxb.2020.26.035
HU Bin, GENG Tianyu, DENG Geng, DUAN Lei. Faster biomedical named entity recognition based on knowledge distillation[J]. Journal of Tsinghua University(Science and Technology). 2021, 61(9): 936-942 https://doi.org/10.16511/j.cnki.qhdxxb.2020.26.035

参考文献

[1] HANISCH D, FUNDEL K, MEVISSEN H T, et al. ProMiner: Rule-based protein and gene entity recognition [J]. BMC Bioinformatics, 2005, 6(S1): S14.
[2] WANG X, ZHANG Y, LI Q, et al. PENNER: Pattern-enhanced nested named entity recognition in biomedical literature [C]//Proceedings of 2018 IEEE International Conference on Bioinformatics and Biomedicine. Madrid, Spain: IEEE, 2018: 540-547.
[3] LI X, ROTH D. Learning question classifiers: The role of semantic information [J]. Natural Language Engineering, 2006, 12(3): 229-249.
[4] LEE K J, HWANG Y S, KIM S, et al. Biomedical named entity recognition using two-phase model based on SVMs [J]. Journal of Biomedical Informatics, 2004, 37(6): 436-447.
[5] PONOMAREVA N, ROSSO P, PLA F, et al. Conditional random fields vs. hidden Markov models in a biomedical named entity recognition task [C]//Proceedings of 2007 International Conference on Recent Advances in Natural Language Processing. Borovets, Bulgaria: RANLP, 2007: 479-483.
[6] LEAMAN R, WEI C H, ZOU C. Mining chemical patents with an ensemble of open systems [J]. Database: The Journal of Biological Databases and Curation, 2016, 2016: baw065.
[7] KIM Y. Convolutional neural networks for sentence classification [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar: Association for Computational Linguistics, 2014: 1746-1751.
[8] ELMAN J L. Finding structure in time [J]. Cognitive Science, 1990, 14(2): 179-211.
[9] HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780.
[10] LUO L, YANG Z H, YANG P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition [J]. Bioinformatics, 2018, 34(8): 1381-1388.
[11] CHO M, HA J, PARK C, et al. Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition [J]. Journal of Biomedical Informatics, 2020, 103(C): 103381.
[12] WANG X, ZHANG Y, REN X, et al. Cross-type biomedical named entity recognition with deep multi-task learning [J]. Bioinformatics, 2019, 35(10): 1745-1752.
[13] YOON W, SO C H, LEE J, et al. CollaboNet: Collaboration of deep neural networks for biomedical named entity recognition [J]. BMC Bioinformatics, 2019, 20(10): 249.
[14] LEE J, YOON W J, KIM S D, et al. BioBERT: A pre-trained biomedical language representation model for biomedical text mining [J]. Bioinformatics, 2020, 36(4): 1234-1240.
[15] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA: NAACL-HLT, 2019.
[16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc., 2017: 5998-6008.
[17] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Louisiana, USA: NAACL-HLT, 2018: 2227-2237.
[18] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [J]. arXiv preprint arXiv, 2015: 1503.02531.
[19] SUN S, CHENG Y, GAN Z, et al. Patient knowledge distillation for bert model compression [C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Hong Kong, China: EMNLP, 2019: 3-7.
[20] CHEN H T, WANG Y H, XU C, et al. Data-free learning of student networks [C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019: 3513-3521.
[21] JIAO X Q, YIN Y C, SHANG L F, et al. TinyBERT: Distilling BERT for natural language understanding [J]. arXiv preprint arXiv, 2019: 1909.10351.
[22] DO AGǦAN R I, LEAMAN R, LU Z Y. NCBI disease corpus: A resource for disease name recognition and concept normalization [J]. Journal of Biomedical Informatics, 2014, 47: 1-10.
[23] LI J, SUN Y P, JOHNSON R J, et al. BioCreative V CDR task corpus: A resource for chemical disease relation extraction [J]. Database: The Journal of Biological Databases and Curation, 2016: baw068.
[24] KRALLINGER M, RABAL O, LEITNER F, et al. The CHEMDNER corpus of chemicals and drugs and its annotation principles [J]. Journal of Cheminformatics, 2015, 7(S1): S2.
[25] MU X F, WANG W, XU A P. Incorporating token-level dictionary feature into neural model for named entity recognition [J]. Neurocomputing, 2020, 375: 43-50.

基金

国家自然科学基金资助项目(61906126,61972268,61572332)

PDF(1180 KB)

Accesses

Citation

Detail

段落导航
相关文章

/