清华大学学报(自然科学版)  2019, Vol. 59 Issue (3): 194-202    DOI: 10.16511/j.cnki.qhdxxb.2018.26.044
罗之皓1, 李劲1, 岳昆2, 毛钰源1, 刘琰1
1. 云南大学 软件学院, 昆明 650500;
2. 云南大学 信息学院, 昆明 650500
Mining Top-k summarization patterns for knowledge graphs
LUO Zhihao1, LI Jin1, YUE Kun2, MAO Yuyuan1, LIU Yan1
1. School of Software, Yunnan University, Kunming 650500, China;
2. School of Information, Yunnan University, Kunming 650500, China
摘要 知识图谱数据具有体量大、内容丰富、类型多样、缺乏统一模式描述等特点。提取知识图谱模式信息并形成摘要模式,对于提升知识检索、挖掘质量具有重要研究意义。该文首先给出了摘要模式的判定准则以及摘要模式质量的度量标准,提出了面向知识图谱的Top-k摘要模式挖掘问题,并将该问题建模为一个次模函数优化问题;其次,为高效判定摘要模式及度量模式的覆盖质量,提出了基于Pregel编程模型的并行化摘要模式判定和质量度量算法;然后,给出了高效求解Top-k摘要模式挖掘问题的贪心算法;最后,在真实知识图谱数据上对本文方法进行了验证。实验结果表明:该方法在摘要模式的覆盖度和算法执行效率方面优于已有方法。
关键词 知识图谱摘要模式挖掘次模函数图匹配    
Abstract:Knowledge graph data has large volumes, rich content, diverse types, and lacks a unified model description. Pattern information needs to be extracted from knowledge graphs to improve the quality of knowledge graph retrieval and mining. This paper presents a knowledge graph summarization pattern and quality metrics. This method is used in an algorithm for mining Top-k summarization patterns (Top-k SPM) formulated as a submodular function optimization problem. Then, a Pregel based parallel algorithm is used to validate the algorithm and measure the qualities of summarization patterns. Two efficient greedy algorithms are also presented to solve the Top-k SPM. The efficiency and effectiveness of the method is then verified on real knowledge graph datasets. The tests show that the method outperforms the existing methods in terms of coverage and algorithm execution time.
Key wordsknowledge graph    summarization pattern mining    submodular function    graph matching
收稿日期: 2018-07-19      出版日期: 2019-03-19
罗之皓, 李劲, 岳昆, 毛钰源, 刘琰. 知识图谱的Top-k摘要模式挖掘方法[J]. 清华大学学报(自然科学版), 2019, 59(3): 194-202.
LUO Zhihao, LI Jin, YUE Kun, MAO Yuyuan, LIU Yan. Mining Top-k summarization patterns for knowledge graphs. Journal of Tsinghua University(Science and Technology), 2019, 59(3): 194-202.
  图1 知识图谱示例
  图2 基于 Pregel的摘要模型判定及其覆盖子图求解
  图3 算法1
  图4 算法2
  表1 图模式挖掘的参数
  图5 subTopk算法在3个数据集上的运行时间
  图6 subTop-k与 BiOpt覆盖度对比
  图7 luTop-k与subTop-k和Biopt的覆盖度对比
  图8 luTop-k与subTop-k和Bioopt的运行时间对比
  图9 在Yago数据集上的Top-k SPM实际案例
