Abstract:Knowledge graph data has large volumes, rich content, diverse types, and lacks a unified model description. Pattern information needs to be extracted from knowledge graphs to improve the quality of knowledge graph retrieval and mining. This paper presents a knowledge graph summarization pattern and quality metrics. This method is used in an algorithm for mining Top-k summarization patterns (Top-k SPM) formulated as a submodular function optimization problem. Then, a Pregel based parallel algorithm is used to validate the algorithm and measure the qualities of summarization patterns. Two efficient greedy algorithms are also presented to solve the Top-k SPM. The efficiency and effectiveness of the method is then verified on real knowledge graph datasets. The tests show that the method outperforms the existing methods in terms of coverage and algorithm execution time.
[1] QIAN J W, LI X Y, ZHANG C H, et al. Social network de-anonymization and privacy inference with knowledge graph model[J]. IEEE Transactions on Dependable and Secure Computing, 2017. DOI:10.1109/TDSC.2017.2697854. [2] SHI B X, WENINGER T. Discriminative predicate path mining for fact checking in knowledge graphs[J]. Knowledge-Based Systems, 2016, 104:123-133. [3] SHI L X, LI S J, YANG X R, et al. Semantic health knowledge graph:Semantic integration of heterogeneous medical knowledge and services[J]. BioMed Research International, 2017, 2858423. [4] 王萍. 网络环境下的领域知识挖掘[D].上海:华东师范大学, 2010.WANG P. Domain knowledge mining in network environments[D]. Shanghai:East China Normal University, 2010. (in Chinese) [5] 陈池, 王宇鹏, 李超, 等. 面向在线教育领域的大数据研究及应用[J]. 计算机研究与发展, 2014, 51(S1):67-74.CHEN C, WANG Y P, LI C, et al. The research and application of big data in the field of online education[J]. Journal of Computer Research and Development, 2014, 51(S1):67-74. (in Chinese) [6] SANG S T, YANG Z Z, WANG L, et al. SemaTyP:A knowledge graph based literature mining method for drug discovery[J]. BMC Bioinformatics, 2018, 19(1):193-193. [7] KEMMAR A, LEBBAH Y, LOUDNI S. Interval graph mining[J]. International Journal of Data Mining, Modelling and Management, 2018, 10(1):1-22. [8] 高俊平, 张晖, 赵旭剑, 等. 面向维基百科的领域知识演化关系抽取[J]. 计算机学报, 2016, 39(10):2088-2101.GAO J P, ZHANG H, ZHAO X J, et al. Evolutionary relation extraction for domain knowledge in Wikipedia[J]. Chinese Journal of Computers, 2016, 39(10):2088-2101. (in Chinese) [9] SONG Q, WU Y H, DONG X L. Mining summaries for knowledge graph search[C]//Proceedings of 2016 IEEE International Conference on Data Mining. Barcelona, Spain:IEEE, 2016:1215-1220. [10] BABAI L. Graph isomorphism in quasipolynomial time[extended abstract] [C]//Proceedings of the 48th Annual ACM Symposium on Theory of Computing. Cambridge, USA:ACM, 2016:684-697. [11] SAMSI S, GADEPALLY V, HURLEY M, et al. Static graph challenge:Subgraph isomorphism[C]//Proceedings of 2017 IEEE High Performance Extreme Computing Conference. Waltham, USA:IEEE, 2017:1-6. [12] KRAUSE A, GOLOVIN D. Submodular function maximization[M]//BORDEAUX L, HAMADI Y, KOHLI P. Tractability. Cambridge:Cambridge University Press, 2014:71-104. [13] DVOŘÁK W, HENZINGER M, WILLIAMSON D P. Maximizing a Submodular function with viability constraints[J]. Algorithmica, 2017, 77(1):152-172. [14] MA S, CAO Y, FAN W F, et al. Capturing topology in graph pattern matching[J]. Proceedings of the VLDB Endowment, 2011, 5(4):310-321. [15] MALEWICZ G, AUSTERN M H, BIK A J C, et al. Pregel:A system for large-scale graph processing[C]//Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. Indianapolis, USA:ACM, 2010:135-146. [16] ELSEIDY M E, ABDELHAMID P, SKIADOPOULOS S, et al. GraMi:Frequent subgraph and pattern mining in a single large graph[J]. Proceedings of the VLDB Endowment, 2014, 7(7):517-528.