Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2017, Vol. 57 Issue (10): 1030-1037    DOI: 10.16511/j.cnki.qhdxxb.2017.25.041
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
基于Spark的并行增量动态社团发现算法
吴斌1,2, 肖琰1,2, 张云雷1,2
1. 北京邮电大学 智能通信软件与多媒体北京市重点实验室, 北京 100876;
2. 北京邮电大学 计算机学院, 北京 100876
Parallel incremental dynamic community detection algorithm based on Spark
WU Bin1,2, XIAO Yan1,2, ZHANG Yunlei1,2
1. Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
全文: PDF(1131 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 动态社团发现是研究网络演化的关键步骤。在数据量迅猛增长的情况下,社团发现的单机算法效率较低。该文提出了一种基于Spark的并行增量动态社团发现算法(parallel incremental dynamic community detection algorithm based on Spark,PIDCDS),为了在GraphX并行图计算平台上通过最大化持久力发现社团,该算法对节点的持久力计算公式进行了有效修正。PIDCDS计算每个时间片中增量节点的持久力指标,更新其社团归属,在保证一定的社团划分准确性的基础上减少计算量。通过与FacetNet动态社团发现算法做比较,该算法能够获得更好的稳定性,同时能发现更真实的社团划分。对比不同规模网络在PIDCDS上的运行时间,发现该时间随着网络节点和边数的增加缓慢增长,性能较高,并且增加执行器核数将在一定程度上加速算法的执行。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
吴斌
肖琰
张云雷
关键词 并行动态社团发现持久力Spark增量计算    
Abstract:Dynamic community detection is a crucial step in researching network evolution. Nonparallel algorithms cannot efficiently mine community structures with large amounts of data. This paper presents a parallel incremental dynamic community detection algorithm based on Spark (PIDCDS), which maximizes the total permanence of the vertices in the network to discover the community structure. PIDCDS uses parallel computations on GraphX to calculate a specialized permanence as the community partition metric. Only the permanence of the incremental vertices is computed in each step to derive a new community structure. This algorithm is accurate with less computations. PIDCDS has better stability and more accurately detects the community structure than the FacetNet algorithm. PIDCDS performs well and the execution time increases only slowly as the network scale increases. More cores will accelerate PIDCDS to some extent.
Key wordsparallel dynamic community detection    permanence    Spark    increment computation
收稿日期: 2016-09-29      出版日期: 2017-10-15
ZTFLH:  TP301.6  
引用本文:   
吴斌, 肖琰, 张云雷. 基于Spark的并行增量动态社团发现算法[J]. 清华大学学报(自然科学版), 2017, 57(10): 1030-1037.
WU Bin, XIAO Yan, ZHANG Yunlei. Parallel incremental dynamic community detection algorithm based on Spark. Journal of Tsinghua University(Science and Technology), 2017, 57(10): 1030-1037.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.25.041  或          http://jst.tsinghuajournals.com/CN/Y2017/V57/I10/1030
  图1 节点v 的持久力计算[6]
  表1 实验所用的服务器配置方案
  表2 公式修正前后社团划分结果比较
  图2 PIDCDS和FacetNet算法在不同数据集上的对比实验
  图3 不同规模的网络在PIDCDS上的运行时间
  表3 不同执行器核数在PIDCDS上的加速情况
[1] 李国杰. 网络科学:21世纪的元科学[J]. 中国计算机学会通讯, 2016,12(4):7. LI Guojie.Network science:The meta science in the 21st century[J]. Communications of the China Computer Federation, 2016,12(4):7. (in Chinese)
[2] Fortunato S. Community detection in graphs[J].Physics Reports, 2010,486(3):75-174.
[3] Girvan M, Newman M E J. Community structure in social and biological networks[J].Proceedings of the National Academy of Sciences, 2002,99(12):7821-7826.
[4] Newman M E J. Modularity and community structure in networks[J].Proceedings of the National Academy of Sciences, 2006,103(23):8577-8582.
[5] 汪小帆, 李翔, 陈关荣. 网络科学导论[M]. 北京:高等教育出版社, 2012. WANG Xiaofan, LI Xiang, CHEN Guanrong. Network Science:An Introduction[M]. Beijing:Higher Education Press, 2012. (in Chinese)
[6] Chakraborty T, Srinivasan S, Ganguly N, et al. On the permanence of vertices in network communities[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA:ACM, 2014:1396-1405.
[7] Ning H, Xu W, Chi Y, et al. Incremental spectral clustering by efficiently updating the eigen-system[J].Pattern Recognition,2010,43(1):113-127.
[8] Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing[C]//Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA,USA:USENIX Association, 2012:141-146.
[9] Apache. Hadoop[EB/OL].[2016-07-29]. http://hadoop.apache.org.
[10] Xin R S, Gonzalez J E, Franklin M J, et al. GraphX:A resilient distributed graph system on spark[C]//Proceedingsof the 1st Int Workshop on Graph Data Management Experiences and Systems. New York, NY, USA:ACM, 2013.
[11] Lu H, Halappanavar M, Kalyanaraman A. Parallel heuristics for scalable community detection[J].Parallel Computing, 2015,47:19-37.
[12] Staudt C L, Meyerhenke H. Engineering parallel algorithms for community detection in massive networks[J].IEEE Transactions on Parallel and Distributed Systems, 2016,27(1):171-184.
[13] Danon L, Diaz-Guilera A, Duch J, et al. Comparing communitystructure identification[J].Journal of Statistical Mechanics:Theory and Experiment, 2005(9), P09008.
[14] Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testing community detection algorithms[J].Physical Review E, 2008,78(4), 046110.
[15] LI Xiaoming, WU Bin, GUO Qian, et al. Dynamic community detection algorithm based on incremental identification[C]//2015 IEEE International Conference on Data Mining Workshop (ICDMW). Atlantic City, NJ, USA:IEEE, 2015.
[16] Lin Y R, Chi Y, Zhu S, et al. Facetnet:A framework for analyzing communities and their evolutions in dynamic networks[C]//Proceedings of the 17th international conference on World Wide Web. New York, NY, USA:ACM, 2008:685-694.
[1] 刘扬, 魏蔚. 面向海量流媒体信道资源分配快速Nash议价算法[J]. 清华大学学报(自然科学版), 2017, 57(10): 1056-1062.
[2] 韩东红, 宋明, 张宏亮, 王佳茜, 王嘉兴, 王国仁. 基于密度的不确定数据流聚类算法[J]. 清华大学学报(自然科学版), 2017, 57(8): 884-891.
[3] 万欣, 刘锡明, 苗积臣, 吴志芳. 改进的多层螺旋CT线性插值算法[J]. 清华大学学报(自然科学版), 2016, 56(12): 1346-1351.
[4] 赵晶玲, 陈石磊, 曹梦晨, 崔宝江. 基于离线汇编指令流分析的恶意程序算法识别技术[J]. 清华大学学报(自然科学版), 2016, 56(5): 484-492.
[5] 马昱春, 张超, Luk Wayne. 基于混合式两阶段的动态部分重构FPGA软硬件划分算法[J]. 清华大学学报(自然科学版), 2016, 56(3): 246-252,261.
[6] 陈东辉, 陈岭, 王俊凯, 吴勇, 王敬昌. 不确定性数据上聚合查询的近似算法[J]. 清华大学学报(自然科学版), 2018, 58(3): 231-236.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn