机械工程

面向自主工业软件的知识提取和知识库构建方法

  • 王立平 ,
  • 张超 ,
  • 蔡恩磊 ,
  • 史慧杰 ,
  • 王冬
展开
  • 1. 清华大学 机械工程系, 北京 100084;
    2. 电子科技大学 机械与电气工程学院, 成都 611731
王立平(1967—),男,教授。

收稿日期: 2021-12-13

  网络出版日期: 2022-04-26

基金资助

国家重点研发计划项目(2020YFB1712303)

Knowledge extraction and knowledge base construction method from industrial software packages

  • WANG Liping ,
  • ZHANG Chao ,
  • CAI Enlei ,
  • SHI Huijie ,
  • WANG Dong
Expand
  • 1. Department of Mechanical Engineering, Tsinghua University, Beijing 100084, China;
    2. School of Mechanical and Electrical Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Received date: 2021-12-13

  Online published: 2022-04-26

摘要

自主工业软件是支撑国内中小企业创新发展的核心力量之一。自主工业软件相关文本中蕴含着大量与制造业相关的知识,但是目前缺少相应的知识提取和知识库构建方法。该文提出一种基于神经网络和自然语言处理的知识提取模型,该模型包括文本表示、实体识别、关系抽取3个部分。基于知识图谱对提取的实体和关系进行建模,通过本体建模定义自主工业软件相关概念,利用图数据建模将本体模型中的概念映射到图数据中,提升了数据检索和建模能力,并将数据持久化存储到知识库中。应用结果表明:该方法可用于构建自主工业软件知识库,对整合制造业相关知识起到重要作用。

本文引用格式

王立平 , 张超 , 蔡恩磊 , 史慧杰 , 王冬 . 面向自主工业软件的知识提取和知识库构建方法[J]. 清华大学学报(自然科学版), 2022 , 62(5) : 978 -986 . DOI: 10.16511/j.cnki.qhdxxb.2022.22.023

Abstract

Industrial software is a key force supporting the development of domestic small and medium-sized enterprises. Industrial software packages contain a large amount of knowledge related to manufacturing processes, but little of the knowledge embedded in these software packages has been extracted and put into a knowledge base. This paper presents a knowledge extraction model that combines neural networks and natural language processing. The model includes text representation, entity recognition, and relationship extraction. The extracted entities and relationships are modeled on a knowledge graph, while related concepts in the software are defined through ontology modeling. The ontology model concepts are mapped to graph data to improve data retrieval and modeling capabilities and the data can be stored in the knowledge base with long term. The results show that this method can build an industrial software knowledge base which will play an important role in integrating manufacturing knowledge.

参考文献

[1] 李保利, 陈玉忠, 俞士汶. 信息抽取研究综述[J]. 计算机工程与应用, 2003, 39(10):1-5, 66. LI B L, CHEN Y Z, YU S W. Research on information extraction:A survey[J]. Computer Engineering and Applications, 2003, 39(10):1-5, 66. (in Chinese)
[2] 王宁, 葛瑞芳, 苑春法, 等. 中文金融新闻中公司名的识别[J]. 中文信息学报, 2002, 16(2):1-6. WANG N, GE R F, YUAN C F, et al. Company name identification in Chinese financial domain[J]. Journal of Chinese Information Processing, 2002, 16(2):1-6. (in Chinese)
[3] 王丹, 樊兴华. 面向短文本的命名实体识别[J]. 计算机应用, 2009, 29(1):143-145, 171. WANG D, FAN X H. Named entity recognition for short text[J]. Journal of Computer Applications, 2009, 29(1):143-145, 171. (in Chinese)
[4] BLANCO E, MOLDOVAN D. Automatic discovery of manner relations and its applications[C]//Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Cambridge, USA:MIT, 2010:315-324.
[5] NING G L, BAI Y L. Biomedical named entity recognition based on Glove-BLSTM-CRF model[J]. Journal of Computational Methods in Sciences and Engineering, 2021, 21(1):125-133.
[6] GAO W C, ZHENG X H, ZHAO S S. Named entity recognition method of Chinese EMR based on BERT-BiLSTM-CRF[J]. Journal of Physics:Conference Series, 2021, 1848(1):012083.
[7] SU Q. Research on relation extraction of computer remote supervision based on neural network[J]. Journal of Physics:Conference Series, 2021, 1744(2):022066.
[8] HAN X Y, ZHANG Y, ZHANG W K, et al. An attention-based model using character composition of entities in Chinese relation extraction[J]. Information, 2020, 11(2):79.
[9] ZHANG T X, LIN H F, TADESSE M M, et al. Chinese medical relation extraction based on multi-hop self-attention mechanism[J]. International Journal of Machine Learning and Cybernetics, 2021, 12(2):355-363.
[10] 张斌, 魏扣, 郝琦. 国内外知识库研究现状述评与比较[J]. 图书情报知识, 2016(3):15-25. ZHANG B, WEI K, HAO Q. Review and comparison of research status of knowledge base at home and abroad[J]. Document, Information & Knowledge, 2016(3):15-25. (in Chinese)
[11] ANDRIĆ A, DEVEDŽIĆ V, ANDREJIĆ M. Translating a knowledge base into HTML[J]. Knowledge-Based Systems, 2006, 19(1):92-101.
[12] Anonymous. The Google knowledge graph:Information gatekeeper or a force to be reckoned with?[J]. Strategic Direction, 2014, 30(4):15-17.
[13] CHEN Y, LIAO Z F, CHEN B, et al. Construction method of knowledge base for power grid-aided decision based on knowledge graph[C]//International Conference on Intelligent Computing, Communication & Devices. Xi'an, China, 2021:356-361.
[14] LIU P C, HUANG Y L, WANG P, et al. Construction of typhoon disaster knowledge graph based on graph database Neo4j[C]//2020 Chinese Control and Decision Conference (CCDC). Hefei, China, 2020:3612-3616.
[15] 熊富林, 邓怡豪, 唐晓晟. Word2vec的核心架构及其应用[J]. 南京师范大学学报(工程技术版), 2015, 15(1):43-48. XIONG F L, DENG Y H, TANG X S. The architecture of Word2vec and its applications[J]. Journal of Nanjing Normal University (Engineering and Technology Edition), 2015, 15(1):43-48. (in Chinese)
[16] CHE W X, LI Z H, LIU T. LTP:A Chinese language technology platform[C]//Proceedings of the 23rd International Conference on Computational Linguistics:Demonstrations. Beijing, China, 2010:13-16.
[17] CAO X ZJ, YANG Y Q. Research on Chinese named entity recognition in the marine field[C]//Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence. Sanya, China, 2018:1-7.
[18] NGUYEN D Q, VERSPOOR K. Convolutional neural networks for chemical-disease relation extraction are improved with character-based word embeddings[Z]. arXiv preprint arXiv:1805.10586, 2018.
文章导航

/