Knowledge extraction and knowledge base construction method from industrial software packages
WANG Liping1,2, ZHANG Chao2, CAI Enlei2, SHI Huijie2, WANG Dong1
1. Department of Mechanical Engineering, Tsinghua University, Beijing 100084, China; 2. School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Abstract:Industrial software is a key force supporting the development of domestic small and medium-sized enterprises. Industrial software packages contain a large amount of knowledge related to manufacturing processes, but little of the knowledge embedded in these software packages has been extracted and put into a knowledge base. This paper presents a knowledge extraction model that combines neural networks and natural language processing. The model includes text representation, entity recognition, and relationship extraction. The extracted entities and relationships are modeled on a knowledge graph, while related concepts in the software are defined through ontology modeling. The ontology model concepts are mapped to graph data to improve data retrieval and modeling capabilities and the data can be stored in the knowledge base with long term. The results show that this method can build an industrial software knowledge base which will play an important role in integrating manufacturing knowledge.
王立平, 张超, 蔡恩磊, 史慧杰, 王冬. 面向自主工业软件的知识提取和知识库构建方法[J]. 清华大学学报(自然科学版), 2022, 62(5): 978-986.
WANG Liping, ZHANG Chao, CAI Enlei, SHI Huijie, WANG Dong. Knowledge extraction and knowledge base construction method from industrial software packages. Journal of Tsinghua University(Science and Technology), 2022, 62(5): 978-986.
[1] 李保利, 陈玉忠, 俞士汶. 信息抽取研究综述[J]. 计算机工程与应用, 2003, 39(10):1-5, 66. LI B L, CHEN Y Z, YU S W. Research on information extraction:A survey[J]. Computer Engineering and Applications, 2003, 39(10):1-5, 66. (in Chinese) [2] 王宁, 葛瑞芳, 苑春法, 等. 中文金融新闻中公司名的识别[J]. 中文信息学报, 2002, 16(2):1-6. WANG N, GE R F, YUAN C F, et al. Company name identification in Chinese financial domain[J]. Journal of Chinese Information Processing, 2002, 16(2):1-6. (in Chinese) [3] 王丹, 樊兴华. 面向短文本的命名实体识别[J]. 计算机应用, 2009, 29(1):143-145, 171. WANG D, FAN X H. Named entity recognition for short text[J]. Journal of Computer Applications, 2009, 29(1):143-145, 171. (in Chinese) [4] BLANCO E, MOLDOVAN D. Automatic discovery of manner relations and its applications[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, USA:MIT, 2010:315-324. [5] NING G L, BAI Y L. Biomedical named entity recognition based on Glove-BLSTM-CRF model[J]. Journal of Computational Methods in Sciences and Engineering, 2021, 21(1):125-133. [6] GAO W C, ZHENG X H, ZHAO S S. Named entity recognition method of Chinese EMR based on BERT-BiLSTM-CRF[J]. Journal of Physics:Conference Series, 2021, 1848(1):012083. [7] SU Q. Research on relation extraction of computer remote supervision based on neural network[J]. Journal of Physics:Conference Series, 2021, 1744(2):022066. [8] HAN X Y, ZHANG Y, ZHANG W K, et al. An attention-based model using character composition of entities in Chinese relation extraction[J]. Information, 2020, 11(2):79. [9] ZHANG T X, LIN H F, TADESSE M M, et al. Chinese medical relation extraction based on multi-hop self-attention mechanism[J]. International Journal of Machine Learning and Cybernetics, 2021, 12(2):355-363. [10] 张斌, 魏扣, 郝琦. 国内外知识库研究现状述评与比较[J]. 图书情报知识, 2016(3):15-25. ZHANG B, WEI K, HAO Q. Review and comparison of research status of knowledge base at home and abroad[J]. Document, Information & Knowledge, 2016(3):15-25. (in Chinese) [11] ANDRIĆ A, DEVEDŽIĆ V, ANDREJIĆ M. Translating a knowledge base into HTML[J]. Knowledge-Based Systems, 2006, 19(1):92-101. [12] Anonymous. The Google knowledge graph:Information gatekeeper or a force to be reckoned with?[J]. Strategic Direction, 2014, 30(4):15-17. [13] CHEN Y, LIAO Z F, CHEN B, et al. Construction method of knowledge base for power grid-aided decision based on knowledge graph[C]//International Conference on Intelligent Computing, Communication & Devices. Xi'an, China, 2021:356-361. [14] LIU P C, HUANG Y L, WANG P, et al. Construction of typhoon disaster knowledge graph based on graph database Neo4j[C]//2020 Chinese Control and Decision Conference (CCDC). Hefei, China, 2020:3612-3616. [15] 熊富林, 邓怡豪, 唐晓晟. Word2vec的核心架构及其应用[J]. 南京师范大学学报(工程技术版), 2015, 15(1):43-48. XIONG F L, DENG Y H, TANG X S. The architecture of Word2vec and its applications[J]. Journal of Nanjing Normal University (Engineering and Technology Edition), 2015, 15(1):43-48. (in Chinese) [16] CHE W X, LI Z H, LIU T. LTP:A Chinese language technology platform[C]//Proceedings of the 23rd International Conference on Computational Linguistics:Demonstrations. Beijing, China, 2010:13-16. [17] CAO X ZJ, YANG Y Q. Research on Chinese named entity recognition in the marine field[C]//Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence. Sanya, China, 2018:1-7. [18] NGUYEN D Q, VERSPOOR K. Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings[Z]. arXiv preprint arXiv:1805.10586, 2018.