Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2019, Vol. 59 Issue (3): 194-202    DOI: 10.16511/j.cnki.qhdxxb.2018.26.044
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
知识图谱的Top-k摘要模式挖掘方法
罗之皓1, 李劲1, 岳昆2, 毛钰源1, 刘琰1
1. 云南大学 软件学院, 昆明 650500;
2. 云南大学 信息学院, 昆明 650500
Mining Top-k summarization patterns for knowledge graphs
LUO Zhihao1, LI Jin1, YUE Kun2, MAO Yuyuan1, LIU Yan1
1. School of Software, Yunnan University, Kunming 650500, China;
2. School of Information, Yunnan University, Kunming 650500, China
全文: PDF(3325 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 知识图谱数据具有体量大、内容丰富、类型多样、缺乏统一模式描述等特点。提取知识图谱模式信息并形成摘要模式,对于提升知识检索、挖掘质量具有重要研究意义。该文首先给出了摘要模式的判定准则以及摘要模式质量的度量标准,提出了面向知识图谱的Top-k摘要模式挖掘问题,并将该问题建模为一个次模函数优化问题;其次,为高效判定摘要模式及度量模式的覆盖质量,提出了基于Pregel编程模型的并行化摘要模式判定和质量度量算法;然后,给出了高效求解Top-k摘要模式挖掘问题的贪心算法;最后,在真实知识图谱数据上对本文方法进行了验证。实验结果表明:该方法在摘要模式的覆盖度和算法执行效率方面优于已有方法。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
罗之皓
李劲
岳昆
毛钰源
刘琰
关键词 知识图谱摘要模式挖掘次模函数图匹配    
Abstract:Knowledge graph data has large volumes, rich content, diverse types, and lacks a unified model description. Pattern information needs to be extracted from knowledge graphs to improve the quality of knowledge graph retrieval and mining. This paper presents a knowledge graph summarization pattern and quality metrics. This method is used in an algorithm for mining Top-k summarization patterns (Top-k SPM) formulated as a submodular function optimization problem. Then, a Pregel based parallel algorithm is used to validate the algorithm and measure the qualities of summarization patterns. Two efficient greedy algorithms are also presented to solve the Top-k SPM. The efficiency and effectiveness of the method is then verified on real knowledge graph datasets. The tests show that the method outperforms the existing methods in terms of coverage and algorithm execution time.
Key wordsknowledge graph    summarization pattern mining    submodular function    graph matching
收稿日期: 2018-07-19      出版日期: 2019-03-19
基金资助:国家自然科学基金资助项目(61562091,61472345);第二批“云岭学者”培养项目(C6153001);云南省应用基础研究计划面上项目(2016FB110);云南大学中青年骨干教师培养计划项目;云南大学青年英才培育计划项目(WX173602);云南大学数据驱动的软件工程科技创新团队项目(2017HC012)
引用本文:   
罗之皓, 李劲, 岳昆, 毛钰源, 刘琰. 知识图谱的Top-k摘要模式挖掘方法[J]. 清华大学学报(自然科学版), 2019, 59(3): 194-202.
LUO Zhihao, LI Jin, YUE Kun, MAO Yuyuan, LIU Yan. Mining Top-k summarization patterns for knowledge graphs. Journal of Tsinghua University(Science and Technology), 2019, 59(3): 194-202.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2018.26.044  或          http://jst.tsinghuajournals.com/CN/Y2019/V59/I3/194
  图1 知识图谱示例
  图2 基于 Pregel的摘要模型判定及其覆盖子图求解
  图3 算法1
  图4 算法2
  表1 图模式挖掘的参数
  图5 subTopk算法在3个数据集上的运行时间
  图6 subTop-k与 BiOpt覆盖度对比
  图7 luTop-k与subTop-k和Biopt的覆盖度对比
  图8 luTop-k与subTop-k和Bioopt的运行时间对比
  图9 在Yago数据集上的Top-k SPM实际案例
[1] QIAN J W, LI X Y, ZHANG C H, et al. Social network de-anonymization and privacy inference with knowledge graph model[J]. IEEE Transactions on Dependable and Secure Computing, 2017. DOI:10.1109/TDSC.2017.2697854.
[2] SHI B X, WENINGER T. Discriminative predicate path mining for fact checking in knowledge graphs[J]. Knowledge-Based Systems, 2016, 104:123-133.
[3] SHI L X, LI S J, YANG X R, et al. Semantic health knowledge graph:Semantic integration of heterogeneous medical knowledge and services[J]. BioMed Research International, 2017, 2858423.
[4] 王萍. 网络环境下的领域知识挖掘[D].上海:华东师范大学, 2010.WANG P. Domain knowledge mining in network environments[D]. Shanghai:East China Normal University, 2010. (in Chinese)
[5] 陈池, 王宇鹏, 李超, 等. 面向在线教育领域的大数据研究及应用[J]. 计算机研究与发展, 2014, 51(S1):67-74.CHEN C, WANG Y P, LI C, et al. The research and application of big data in the field of online education[J]. Journal of Computer Research and Development, 2014, 51(S1):67-74. (in Chinese)
[6] SANG S T, YANG Z Z, WANG L, et al. SemaTyP:A knowledge graph based literature mining method for drug discovery[J]. BMC Bioinformatics, 2018, 19(1):193-193.
[7] KEMMAR A, LEBBAH Y, LOUDNI S. Interval graph mining[J]. International Journal of Data Mining, Modelling and Management, 2018, 10(1):1-22.
[8] 高俊平, 张晖, 赵旭剑, 等. 面向维基百科的领域知识演化关系抽取[J]. 计算机学报, 2016, 39(10):2088-2101.GAO J P, ZHANG H, ZHAO X J, et al. Evolutionary relation extraction for domain knowledge in Wikipedia[J]. Chinese Journal of Computers, 2016, 39(10):2088-2101. (in Chinese)
[9] SONG Q, WU Y H, DONG X L. Mining summaries for knowledge graph search[C]//Proceedings of 2016 IEEE International Conference on Data Mining. Barcelona, Spain:IEEE, 2016:1215-1220.
[10] BABAI L. Graph isomorphism in quasipolynomial time[extended abstract] [C]//Proceedings of the 48th Annual ACM Symposium on Theory of Computing. Cambridge, USA:ACM, 2016:684-697.
[11] SAMSI S, GADEPALLY V, HURLEY M, et al. Static graph challenge:Subgraph isomorphism[C]//Proceedings of 2017 IEEE High Performance Extreme Computing Conference. Waltham, USA:IEEE, 2017:1-6.
[12] KRAUSE A, GOLOVIN D. Submodular function maximization[M]//BORDEAUX L, HAMADI Y, KOHLI P. Tractability. Cambridge:Cambridge University Press, 2014:71-104.
[13] DVOŘÁK W, HENZINGER M, WILLIAMSON D P. Maximizing a Submodular function with viability constraints[J]. Algorithmica, 2017, 77(1):152-172.
[14] MA S, CAO Y, FAN W F, et al. Capturing topology in graph pattern matching[J]. Proceedings of the VLDB Endowment, 2011, 5(4):310-321.
[15] MALEWICZ G, AUSTERN M H, BIK A J C, et al. Pregel:A system for large-scale graph processing[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. Indianapolis, USA:ACM, 2010:135-146.
[16] ELSEIDY M E, ABDELHAMID P, SKIADOPOULOS S, et al. GraMi:Frequent subgraph and pattern mining in a single large graph[J]. Proceedings of the VLDB Endowment, 2014, 7(7):517-528.
[1] 胡明昊, 王芳, 徐先涛, 罗威, 刘晓鹏, 罗准辰, 谭玉珊. 国防科技领域两阶段开放信息抽取方法[J]. 清华大学学报(自然科学版), 2023, 63(9): 1309-1316.
[2] 陈传刚, 胡瑾秋, 韩子从, 陈怡玥, 肖尚蕊. 恶劣环境条件下海外天然气管道站场事故演化知识图谱建模及预警方法[J]. 清华大学学报(自然科学版), 2022, 62(6): 1081-1087.
[3] 王屹超, 朱慕华, 许晨, 张琰, 王会珍, 朱靖波. 利用图像描述与知识图谱增强表示的视觉问答[J]. 清华大学学报(自然科学版), 2022, 62(5): 900-907.
[4] 王立平, 张超, 蔡恩磊, 史慧杰, 王冬. 面向自主工业软件的知识提取和知识库构建方法[J]. 清华大学学报(自然科学版), 2022, 62(5): 978-986.
[5] 兰超, 张勇, 邢春晓. 分布式Top-k子图匹配技术[J]. 清华大学学报(自然科学版), 2016, 56(8): 871-877.
[6] 邓可欣. 基于超边图匹配的视网膜眼底图像配准算法[J]. 清华大学学报(自然科学版), 2014, 54(5): 568-574.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn