面向自主工业软件的知识提取和知识库构建方法

王立平, 张超, 蔡恩磊, 史慧杰, 王冬

清华大学学报(自然科学版) ›› 2022, Vol. 62 ›› Issue (5) : 978-986.

PDF(5859 KB)
PDF(5859 KB)
清华大学学报(自然科学版) ›› 2022, Vol. 62 ›› Issue (5) : 978-986. DOI: 10.16511/j.cnki.qhdxxb.2022.22.023
机械工程

面向自主工业软件的知识提取和知识库构建方法

  • 王立平1,2, 张超2, 蔡恩磊2, 史慧杰2, 王冬1
作者信息 +

Knowledge extraction and knowledge base construction method from industrial software packages

  • WANG Liping1,2, ZHANG Chao2, CAI Enlei2, SHI Huijie2, WANG Dong1
Author information +
文章历史 +

摘要

自主工业软件是支撑国内中小企业创新发展的核心力量之一。自主工业软件相关文本中蕴含着大量与制造业相关的知识,但是目前缺少相应的知识提取和知识库构建方法。该文提出一种基于神经网络和自然语言处理的知识提取模型,该模型包括文本表示、实体识别、关系抽取3个部分。基于知识图谱对提取的实体和关系进行建模,通过本体建模定义自主工业软件相关概念,利用图数据建模将本体模型中的概念映射到图数据中,提升了数据检索和建模能力,并将数据持久化存储到知识库中。应用结果表明:该方法可用于构建自主工业软件知识库,对整合制造业相关知识起到重要作用。

Abstract

Industrial software is a key force supporting the development of domestic small and medium-sized enterprises. Industrial software packages contain a large amount of knowledge related to manufacturing processes, but little of the knowledge embedded in these software packages has been extracted and put into a knowledge base. This paper presents a knowledge extraction model that combines neural networks and natural language processing. The model includes text representation, entity recognition, and relationship extraction. The extracted entities and relationships are modeled on a knowledge graph, while related concepts in the software are defined through ontology modeling. The ontology model concepts are mapped to graph data to improve data retrieval and modeling capabilities and the data can be stored in the knowledge base with long term. The results show that this method can build an industrial software knowledge base which will play an important role in integrating manufacturing knowledge.

关键词

自主工业软件 / 神经网络 / 实体识别 / 关系抽取 / 知识图谱

Key words

industry software / neural network / entity recognition / relation extraction / knowledge graph

引用本文

导出引用
王立平, 张超, 蔡恩磊, 史慧杰, 王冬. 面向自主工业软件的知识提取和知识库构建方法[J]. 清华大学学报(自然科学版). 2022, 62(5): 978-986 https://doi.org/10.16511/j.cnki.qhdxxb.2022.22.023
WANG Liping, ZHANG Chao, CAI Enlei, SHI Huijie, WANG Dong. Knowledge extraction and knowledge base construction method from industrial software packages[J]. Journal of Tsinghua University(Science and Technology). 2022, 62(5): 978-986 https://doi.org/10.16511/j.cnki.qhdxxb.2022.22.023

参考文献

[1] 李保利, 陈玉忠, 俞士汶. 信息抽取研究综述[J]. 计算机工程与应用, 2003, 39(10):1-5, 66. LI B L, CHEN Y Z, YU S W. Research on information extraction:A survey[J]. Computer Engineering and Applications, 2003, 39(10):1-5, 66. (in Chinese)
[2] 王宁, 葛瑞芳, 苑春法, 等. 中文金融新闻中公司名的识别[J]. 中文信息学报, 2002, 16(2):1-6. WANG N, GE R F, YUAN C F, et al. Company name identification in Chinese financial domain[J]. Journal of Chinese Information Processing, 2002, 16(2):1-6. (in Chinese)
[3] 王丹, 樊兴华. 面向短文本的命名实体识别[J]. 计算机应用, 2009, 29(1):143-145, 171. WANG D, FAN X H. Named entity recognition for short text[J]. Journal of Computer Applications, 2009, 29(1):143-145, 171. (in Chinese)
[4] BLANCO E, MOLDOVAN D. Automatic discovery of manner relations and its applications[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, USA:MIT, 2010:315-324.
[5] NING G L, BAI Y L. Biomedical named entity recognition based on Glove-BLSTM-CRF model[J]. Journal of Computational Methods in Sciences and Engineering, 2021, 21(1):125-133.
[6] GAO W C, ZHENG X H, ZHAO S S. Named entity recognition method of Chinese EMR based on BERT-BiLSTM-CRF[J]. Journal of Physics:Conference Series, 2021, 1848(1):012083.
[7] SU Q. Research on relation extraction of computer remote supervision based on neural network[J]. Journal of Physics:Conference Series, 2021, 1744(2):022066.
[8] HAN X Y, ZHANG Y, ZHANG W K, et al. An attention-based model using character composition of entities in Chinese relation extraction[J]. Information, 2020, 11(2):79.
[9] ZHANG T X, LIN H F, TADESSE M M, et al. Chinese medical relation extraction based on multi-hop self-attention mechanism[J]. International Journal of Machine Learning and Cybernetics, 2021, 12(2):355-363.
[10] 张斌, 魏扣, 郝琦. 国内外知识库研究现状述评与比较[J]. 图书情报知识, 2016(3):15-25. ZHANG B, WEI K, HAO Q. Review and comparison of research status of knowledge base at home and abroad[J]. Document, Information & Knowledge, 2016(3):15-25. (in Chinese)
[11] ANDRIĆ A, DEVEDŽIĆ V, ANDREJIĆ M. Translating a knowledge base into HTML[J]. Knowledge-Based Systems, 2006, 19(1):92-101.
[12] Anonymous. The Google knowledge graph:Information gatekeeper or a force to be reckoned with?[J]. Strategic Direction, 2014, 30(4):15-17.
[13] CHEN Y, LIAO Z F, CHEN B, et al. Construction method of knowledge base for power grid-aided decision based on knowledge graph[C]//International Conference on Intelligent Computing, Communication & Devices. Xi'an, China, 2021:356-361.
[14] LIU P C, HUANG Y L, WANG P, et al. Construction of typhoon disaster knowledge graph based on graph database Neo4j[C]//2020 Chinese Control and Decision Conference (CCDC). Hefei, China, 2020:3612-3616.
[15] 熊富林, 邓怡豪, 唐晓晟. Word2vec的核心架构及其应用[J]. 南京师范大学学报(工程技术版), 2015, 15(1):43-48. XIONG F L, DENG Y H, TANG X S. The architecture of Word2vec and its applications[J]. Journal of Nanjing Normal University (Engineering and Technology Edition), 2015, 15(1):43-48. (in Chinese)
[16] CHE W X, LI Z H, LIU T. LTP:A Chinese language technology platform[C]//Proceedings of the 23rd International Conference on Computational Linguistics:Demonstrations. Beijing, China, 2010:13-16.
[17] CAO X ZJ, YANG Y Q. Research on Chinese named entity recognition in the marine field[C]//Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence. Sanya, China, 2018:1-7.
[18] NGUYEN D Q, VERSPOOR K. Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings[Z]. arXiv preprint arXiv:1805.10586, 2018.

基金

国家重点研发计划项目(2020YFB1712303)

PDF(5859 KB)

Accesses

Citation

Detail

段落导航
相关文章

/