Abstract:Many biomedical literature mining systems use the pre-training language model BioBert which provides state-of-the-art biomedical named entity recognition after pre-training. However, BioBert is too large scale and slow. This paper presents a faster biomedical named entity recognition model, FastBioNER, that is based on knowledge distillation. FastBioNER compresses the BioBert model using dynamic knowledge distillation. A dynamic weight function is used to simulate the real learning behavior to adjust the importance of the loss function of each part during training. Then, the trained BioBert is compressed into a small student model by dynamic knowledge distillation. The FastBioNER model was validated on three common data sets, NCBI disease, BC5CDR-chem and BC4CHEMD. The tests show that FastBioNER had the highest F1 values after BioBert at 88.63%, 92.82% and 92.60% for the three data sets while reducing the BioBert model size by 39.26% and the inference time by 46.17% at the cost of 1.10%, 0.86% and 0.15% smaller F1.
胡滨, 耿天玉, 邓赓, 段磊. 基于知识蒸馏的高效生物医学命名实体识别模型[J]. 清华大学学报(自然科学版), 2021, 61(9): 936-942.
HU Bin, GENG Tianyu, DENG Geng, DUAN Lei. Faster biomedical named entity recognition based on knowledge distillation. Journal of Tsinghua University(Science and Technology), 2021, 61(9): 936-942.
[1] HANISCH D, FUNDEL K, MEVISSEN H T, et al. ProMiner: Rule-based protein and gene entity recognition [J]. BMC Bioinformatics, 2005, 6(S1): S14. [2] WANG X, ZHANG Y, LI Q, et al. PENNER: Pattern-enhanced nested named entity recognition in biomedical literature [C]//Proceedings of 2018 IEEE International Conference on Bioinformatics and Biomedicine. Madrid, Spain: IEEE, 2018: 540-547. [3] LI X, ROTH D. Learning question classifiers: The role of semantic information [J]. Natural Language Engineering, 2006, 12(3): 229-249. [4] LEE K J, HWANG Y S, KIM S, et al. Biomedical named entity recognition using two-phase model based on SVMs [J]. Journal of Biomedical Informatics, 2004, 37(6): 436-447. [5] PONOMAREVA N, ROSSO P, PLA F, et al. Conditional random fields vs. hidden Markov models in a biomedical named entity recognition task [C]//Proceedings of 2007 International Conference on Recent Advances in Natural Language Processing. Borovets, Bulgaria: RANLP, 2007: 479-483. [6] LEAMAN R, WEI C H, ZOU C. Mining chemical patents with an ensemble of open systems [J]. Database: The Journal of Biological Databases and Curation, 2016, 2016: baw065. [7] KIM Y. Convolutional neural networks for sentence classification [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar: Association for Computational Linguistics, 2014: 1746-1751. [8] ELMAN J L. Finding structure in time [J]. Cognitive Science, 1990, 14(2): 179-211. [9] HOCHREITER S, SCHMIDHUBER J. Long short-term memory [J]. Neural Computation, 1997, 9(8): 1735-1780. [10] LUO L, YANG Z H, YANG P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition [J]. Bioinformatics, 2018, 34(8): 1381-1388. [11] CHO M, HA J, PARK C, et al. Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition [J]. Journal of Biomedical Informatics, 2020, 103(C): 103381. [12] WANG X, ZHANG Y, REN X, et al. Cross-type biomedical named entity recognition with deep multi-task learning [J]. Bioinformatics, 2019, 35(10): 1745-1752. [13] YOON W, SO C H, LEE J, et al. CollaboNet: Collaboration of deep neural networks for biomedical named entity recognition [J]. BMC Bioinformatics, 2019, 20(10): 249. [14] LEE J, YOON W J, KIM S D, et al. BioBERT: A pre-trained biomedical language representation model for biomedical text mining [J]. Bioinformatics, 2020, 36(4): 1234-1240. [15] DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis, USA: NAACL-HLT, 2019. [16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc., 2017: 5998-6008. [17] PETERS M, NEUMANN M, IYYER M, et al. Deep contextualized word representations [C]//Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Louisiana, USA: NAACL-HLT, 2018: 2227-2237. [18] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network [J]. arXiv preprint arXiv, 2015: 1503.02531. [19] SUN S, CHENG Y, GAN Z, et al. Patient knowledge distillation for bert model compression [C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Hong Kong, China: EMNLP, 2019: 3-7. [20] CHEN H T, WANG Y H, XU C, et al. Data-free learning of student networks [C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul, Korea (South): IEEE, 2019: 3513-3521. [21] JIAO X Q, YIN Y C, SHANG L F, et al. TinyBERT: Distilling BERT for natural language understanding [J]. arXiv preprint arXiv, 2019: 1909.10351. [22] DO AGǦAN R I, LEAMAN R, LU Z Y. NCBI disease corpus: A resource for disease name recognition and concept normalization [J]. Journal of Biomedical Informatics, 2014, 47: 1-10. [23] LI J, SUN Y P, JOHNSON R J, et al. BioCreative V CDR task corpus: A resource for chemical disease relation extraction [J]. Database: The Journal of Biological Databases and Curation, 2016: baw068. [24] KRALLINGER M, RABAL O, LEITNER F, et al. The CHEMDNER corpus of chemicals and drugs and its annotation principles [J]. Journal of Cheminformatics, 2015, 7(S1): S2. [25] MU X F, WANG W, XU A P. Incorporating token-level dictionary feature into neural model for named entity recognition [J]. Neurocomputing, 2020, 375: 43-50.