计算机科学与技术

知识图谱的Top-k摘要模式挖掘方法

  • 罗之皓 ,
  • 李劲 ,
  • 岳昆 ,
  • 毛钰源 ,
  • 刘琰
展开
  • 1. 云南大学 软件学院, 昆明 650500;
    2. 云南大学 信息学院, 昆明 650500

收稿日期: 2018-07-19

  网络出版日期: 2019-03-19

基金资助

国家自然科学基金资助项目(61562091,61472345);第二批“云岭学者”培养项目(C6153001);云南省应用基础研究计划面上项目(2016FB110);云南大学中青年骨干教师培养计划项目;云南大学青年英才培育计划项目(WX173602);云南大学数据驱动的软件工程科技创新团队项目(2017HC012)

Mining Top-k summarization patterns for knowledge graphs

  • LUO Zhihao ,
  • LI Jin ,
  • YUE Kun ,
  • MAO Yuyuan ,
  • LIU Yan
Expand
  • 1. School of Software, Yunnan University, Kunming 650500, China;
    2. School of Information, Yunnan University, Kunming 650500, China

Received date: 2018-07-19

  Online published: 2019-03-19

摘要

知识图谱数据具有体量大、内容丰富、类型多样、缺乏统一模式描述等特点。提取知识图谱模式信息并形成摘要模式,对于提升知识检索、挖掘质量具有重要研究意义。该文首先给出了摘要模式的判定准则以及摘要模式质量的度量标准,提出了面向知识图谱的Top-k摘要模式挖掘问题,并将该问题建模为一个次模函数优化问题;其次,为高效判定摘要模式及度量模式的覆盖质量,提出了基于Pregel编程模型的并行化摘要模式判定和质量度量算法;然后,给出了高效求解Top-k摘要模式挖掘问题的贪心算法;最后,在真实知识图谱数据上对本文方法进行了验证。实验结果表明:该方法在摘要模式的覆盖度和算法执行效率方面优于已有方法。

本文引用格式

罗之皓 , 李劲 , 岳昆 , 毛钰源 , 刘琰 . 知识图谱的Top-k摘要模式挖掘方法[J]. 清华大学学报(自然科学版), 2019 , 59(3) : 194 -202 . DOI: 10.16511/j.cnki.qhdxxb.2018.26.044

Abstract

Knowledge graph data has large volumes, rich content, diverse types, and lacks a unified model description. Pattern information needs to be extracted from knowledge graphs to improve the quality of knowledge graph retrieval and mining. This paper presents a knowledge graph summarization pattern and quality metrics. This method is used in an algorithm for mining Top-k summarization patterns (Top-k SPM) formulated as a submodular function optimization problem. Then, a Pregel based parallel algorithm is used to validate the algorithm and measure the qualities of summarization patterns. Two efficient greedy algorithms are also presented to solve the Top-k SPM. The efficiency and effectiveness of the method is then verified on real knowledge graph datasets. The tests show that the method outperforms the existing methods in terms of coverage and algorithm execution time.

参考文献

[1] QIAN J W, LI X Y, ZHANG C H, et al. Social network de-anonymization and privacy inference with knowledge graph model[J]. IEEE Transactions on Dependable and Secure Computing, 2017. DOI:10.1109/TDSC.2017.2697854.
[2] SHI B X, WENINGER T. Discriminative predicate path mining for fact checking in knowledge graphs[J]. Knowledge-Based Systems, 2016, 104:123-133.
[3] SHI L X, LI S J, YANG X R, et al. Semantic health knowledge graph:Semantic integration of heterogeneous medical knowledge and services[J]. BioMed Research International, 2017, 2858423.
[4] 王萍. 网络环境下的领域知识挖掘[D].上海:华东师范大学, 2010.WANG P. Domain knowledge mining in network environments[D]. Shanghai:East China Normal University, 2010. (in Chinese)
[5] 陈池, 王宇鹏, 李超, 等. 面向在线教育领域的大数据研究及应用[J]. 计算机研究与发展, 2014, 51(S1):67-74.CHEN C, WANG Y P, LI C, et al. The research and application of big data in the field of online education[J]. Journal of Computer Research and Development, 2014, 51(S1):67-74. (in Chinese)
[6] SANG S T, YANG Z Z, WANG L, et al. SemaTyP:A knowledge graph based literature mining method for drug discovery[J]. BMC Bioinformatics, 2018, 19(1):193-193.
[7] KEMMAR A, LEBBAH Y, LOUDNI S. Interval graph mining[J]. International Journal of Data Mining, Modelling and Management, 2018, 10(1):1-22.
[8] 高俊平, 张晖, 赵旭剑, 等. 面向维基百科的领域知识演化关系抽取[J]. 计算机学报, 2016, 39(10):2088-2101.GAO J P, ZHANG H, ZHAO X J, et al. Evolutionary relation extraction for domain knowledge in Wikipedia[J]. Chinese Journal of Computers, 2016, 39(10):2088-2101. (in Chinese)
[9] SONG Q, WU Y H, DONG X L. Mining summaries for knowledge graph search[C]//Proceedings of 2016 IEEE International Conference on Data Mining. Barcelona, Spain:IEEE, 2016:1215-1220.
[10] BABAI L. Graph isomorphism in quasipolynomial time[extended abstract] [C]//Proceedings of the 48th Annual ACM Symposium on Theory of Computing. Cambridge, USA:ACM, 2016:684-697.
[11] SAMSI S, GADEPALLY V, HURLEY M, et al. Static graph challenge:Subgraph isomorphism[C]//Proceedings of 2017 IEEE High Performance Extreme Computing Conference. Waltham, USA:IEEE, 2017:1-6.
[12] KRAUSE A, GOLOVIN D. Submodular function maximization[M]//BORDEAUX L, HAMADI Y, KOHLI P. Tractability. Cambridge:Cambridge University Press, 2014:71-104.
[13] DVOŘÁK W, HENZINGER M, WILLIAMSON D P. Maximizing a Submodular function with viability constraints[J]. Algorithmica, 2017, 77(1):152-172.
[14] MA S, CAO Y, FAN W F, et al. Capturing topology in graph pattern matching[J]. Proceedings of the VLDB Endowment, 2011, 5(4):310-321.
[15] MALEWICZ G, AUSTERN M H, BIK A J C, et al. Pregel:A system for large-scale graph processing[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. Indianapolis, USA:ACM, 2010:135-146.
[16] ELSEIDY M E, ABDELHAMID P, SKIADOPOULOS S, et al. GraMi:Frequent subgraph and pattern mining in a single large graph[J]. Proceedings of the VLDB Endowment, 2014, 7(7):517-528.
文章导航

/