计算机科学与技术

基于模块度感知图自编码器的重叠社区发现模型

  • 陈洁 ,
  • 刘斌斌 ,
  • 赵姝 ,
  • 张燕平
展开
  • 1. 安徽大学 计算机科学与技术学院, 合肥 230601;
    2. 计算智能与信号处理教育部重点实验室(安徽大学), 合肥 230601

收稿日期: 2023-10-12

  网络出版日期: 2024-07-19

基金资助

国家自然科学基金项目(61876001);国家社会科学基金重大项目(18ZDA032)

Overlapping community detection model based on a modularity-aware graph autoencoder

  • CHEN Jie ,
  • LIU Binbin ,
  • ZHAO Shu ,
  • ZHANG Yanping
Expand
  • 1. School of Computer Science and Technology, Anhui University, Hefei 230601, China;
    2. Ministry of Education Key Laboratory of Computational Intelligence and Signal Processing, Anhui University, Hefei 230601, China

Received date: 2023-10-12

  Online published: 2024-07-19

摘要

图自编码器模型作为网络表示学习的代表性方法,在链路预测和节点分类任务方面性能表现优异。然而,图自编码器模型在处理社区发现任务时通常只考虑局部节点连边的重建而忽略了社区全局结构的影响,尤其是在多个社区存在重叠节点的情况下,难以准确判断节点归属关系和社区分布。针对此问题,该文提出了一种面向重叠社区发现的无监督模块度感知图自编码器模型(modularity-aware graph autoencoder model for overlapping community detection,GAME),GAME采用一种高效的模块度损失函数,该函数在网络嵌入过程中保留社区关系的同时,能重构损失并更新编码器的参数,以提高模型针对重叠社区发现任务的性能,进而将GAME得到的社区隶属度矩阵以概率-节点形式进行社区分配。该文提出的GAME在10个公开数据集上进行实验验证,并与主流的基于表示学习的重叠社区发现模型进行对比。实验结果表明:在归一化互信息(normalized mutual information,NMI)评估指标下,GAME模型性能优于主流模型,证明该模型有效。

本文引用格式

陈洁 , 刘斌斌 , 赵姝 , 张燕平 . 基于模块度感知图自编码器的重叠社区发现模型[J]. 清华大学学报(自然科学版), 2024 , 64(8) : 1319 -1329 . DOI: 10.16511/j.cnki.qhdxxb.2024.26.018

Abstract

[Objective] In the ever-expanding field of network science, the abstraction of complex entity relationships into network structures provides a foundation for understanding real-world interactions. The discovery of communities within these networks plays a pivotal role in identifying clusters of closely interconnected nodes. This process reveals latent patterns and functionalities inherent in the intricate fabric of reality, proving invaluable for tracking dynamic network behaviors and assessing community influences. These influences span a range of phenomena, from rumor propagation to virus outbreaks and tumor evolution. A notable characteristic of these communities is their overlapping nature, with participants often straddling multiple community boundaries. This characteristic adds an additional layer of complexity to the exploration of network structures, making the discovery of overlapping communities imperative for a comprehensive understanding of network structures and functional dynamics. [Methods] Within the realm of network science, network representation learning algorithms have significantly enriched the pursuit of community discovery. These algorithms adeptly transform complex network information into lower-dimensional vectors, effectively maintaining the underlying network structure and attribute information. Such representations prove invaluable for subsequent graph processing tasks, including but not limited to link prediction, node classification, and community discovery. Among these algorithms, the graph autoencoder model is a prominent representative, demonstrating efficiency in learning network embeddings and finding applications in diverse community discovery tasks. However, a limitation inherent in traditional graph autoencoder models is their predominant focus on local node-edge reconstruction. This focus often overlooks the crucial influence of community structure, particularly in scenarios featuring overlapping nodes across multiple communities. This inherent challenge makes it difficult to precisely determine node affiliations and community distributions. To address this issue, we introduce an innovative unsupervised modularity-aware graph autoencoder model (GAME) designed for overlapping community discovery. The model incorporates an efficient modularity maximization loss function into the graph autoencoder framework. This ensures the preservation of community structure throughout the network embedding process. The modularity-aware loss is meticulously reconstructed to facilitate the update of encoder parameters, thereby improving the model performance in overlapping community discovery tasks. We harness the resulting community membership matrix to probabilistically assign communities to nodes. [Results] The efficacy of the proposed GAME model was rigorously evaluated across six diverse social network datasets (Facebook 348, Facebook 414, Facebook 686, Facebook 698, Facebook 1684, and Facebook 1912), with node counts ranging from 60—800. Additionally, assessments were conducted on four collaborator network datasets (Computer Science, Engineering, Chemistry, and Medicine) featuring node counts ranging from 1.4×104 to 6.4×104. Comparative analyses with seven prevalent overlapping community discovery methods, encompassing both traditional and graph autoencoder-based algorithms, demonstrated a noteworthy 2.1% improvement under the normalized mutual information (NMI) evaluation index. This performance enhancement substantiated the tangible advantages and effectiveness of the proposed GAME model. [Conclusions] The integration of an efficient modularity maximization loss function into the graph autoencoder model, as demonstrated by the GAME model, successfully addresses the conventional limitations of graph autoencoders. These models often prioritize the reconstruction of local node connections during community discovery tasks, often overlooking the overarching structure of the community, particularly when confronted with overlapping nodes. The experimentally validated performance boost underscores the GAME model's efficacy in navigating the complexities of overlapping community discovery compared to mainstream methods. However, it is worth noting that the model's reliance on substantial memory resources can become a challenge when handling datasets that combine network structure and node attributes. This is especially apparent in scenarios with small attribute networks (N≤800), where the model exhibits insensitivity to the threshold ρ variation. Future work will focus on refining the model to mitigate these challenges and ensure optimal performance across a broader spectrum of real-world scenarios.

参考文献

[1] LIN X S, MA Y F, ZHANG J S, et al. GSO-simulcast:Global stream orchestration in simulcast video conferencing systems[C]//Proceedings of the ACM SIGCOMM 2022 Conference. Amsterdam, the Netherlands:ACM, 2022:826-839.
[2] KARL K A, PELUCHETTE J V, AGH AKHANI N. Virtual work meetings during the COVID-19 pandemic:The good, bad, and ugly[J]. Small Group Research, 2022, 53(3):343-365.
[3] KÄMÄRÄINEN T, SIEKKINEN M, YLÄ-JÄÄSKI A, et al. A measurement study on achieving imperceptible latency in mobile cloud gaming[C]//Proceedings of the 8th ACM on Multimedia Systems Conference. Taibei, China:ACM, 2017:88-99.
[4] BRAKMO L S, O'MALLEY S W, PETERSON L L. TCP Vegas:New techniques for congestion detection and avoidance[J]. ACM SIGCOMM Computer Communication Review, 1994, 24(4):24-35.
[5] HENGARTNER U, BOLLIGER J, GROSS T. TCP Vegas revisited[C]//Proceedings IEEE INFOCOM 2000. Tel Aviv, Israel:IEEE, 2000:1546-1555.
[6] JIN C, WEI D, LOW S H, et al. FAST TCP:From theory to experiments[J]. IEEE Network, 2005, 19(1):4-11.
[7] ARAMANI A, KOKKU R, DAHLIN M. TCP Nice:A mechanism for background transfers[J]. ACM SIGOPS Operating Systems Review, 2002, 36(SI):329-343.
[8] KUEHLEWIND M, HAZEL G, SHALUNOV S, et al. Low extra delay background transport (ledbat)[R/OL].(2012-12-19)[2023-7-15]. https://www.rfc-editor.org/rfc/rfc6817.
[9] MENG T, SCHIFF N R, GODFREY P B, et al. PCC proteus:Scavenger transport and beyond[C]//Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication. New York, USA:ACM, 2020:615-631.
[10] ZAKI Y, PÖTSCH T, CHEN J, et al. Adaptive congestion control for unpredictable cellular networks[C]//Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. London, UK:ACM, 2015:509-522.
[11] ABBASLOO S, XU Y, CHAO H J. C2TCP:A flexible cellular TCP to meet stringent delay requirements[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(4):918-932.
[12] ARUN V, BALAKRISHNAN H. Copa:Practical delay-based congestion control for the internet[C]//15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). Renton, USA:USENIX Association, 2018:329-342.
[13] LIU S, BAŞAR T, SRIKANT R. TCP-Illinois:A loss and delay-based congestion control algorithm for high-speed networks[C]//Proceedings of the 1st International Conference on Performance Evaluation Methodolgies and Tools. Pisa, Italy:ACM, 2006:55-es.
[14] TAN K, SONG J, ZHANG Q, et al. A compound TCP approach for high-speed and long distance networks[C]//Proceedings of the 25th IEEE International Conference on Computer Communications. Barcelona, Spain:IEEE, 2006:1-12.
[15] CARLUCCI G, DE CICCO L, HOLMER S, et al. Congestion control for web real-time communication[J]. IEEE/ACM Transactions on Networking, 2017, 25(5):2629-2642.
[16] ZHU X Q, PAN R. NADA:A unified congestion control scheme for low-latency interactive video[C]//201320th International Packet Video Workshop. San Jose, USA:IEEE, 2013:1-8.
[17] JOHANSSON I. Self-clocked rate adaptation for conversational video in LTE[C]//Proceedings of the 2014 ACM SIGCOMM Workshop on Capacity Sharing Workshop. Chicago, USA:ACM, 2014:51-56.
[18] CARDWELL N, CHENG Y C, GUNN C S, et al. BBR:Congestion-based congestion control:Measuring bottleneck bandwidth and round-trip propagation time[J]. Queue, 2016, 14(5):20-53.
[19] HOCK M, BLESS R, ZITTERBART M. Experimental evaluation of BBR congestion control[C]//2017 IEEE 25th International Conference on Network Protocols (ICNP). Toronto, Canada:IEEE, 2017:1-10.
[20] WINSTEIN K, SIVARAMAN A, BALAKRISHNAN H. Stochastic forecasts achieve high throughput and low delay over cellular networks[C]//10th USENIX Symposium on Networked Systems Design and Implementation. Lombard, USA:USENIX Association, 2013:459-472.
[21] POLESE M, CHIARIOTTI F, BONETTO E, et al. A survey on recent advances in transport layer protocols[J]. IEEE Communications Surveys&Tutorials, 2019, 21(4):3584-3608.
[22] AL-SAADI R, ARMITAGE G, BUT J, et al. A survey of delay-based and hybrid TCP congestion control algorithms[J]. IEEE Communications Surveys&Tutorials, 2019, 21(4):3609-3638.
[23] HAILE H, GRINNEMO K J, FERLIN S, et al. End-to-end congestion control approaches for high throughput and low delay in 4G/5G cellular networks[J]. Computer Networks, 2021, 186:107692.
[24] LORINCZ J, KLARIN Z, OŻEGOVIĆ J. A comprehensive overview of TCP congestion control in 5G networks:Research challenges and future perspectives[J]. Sensors, 2021, 21(13):4510.
[25] ABBASLOO S, YEN C Y, CHAO H J. Classic meets modern:A pragmatic learning-based congestion control for the internet[C]//Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication. New York, USA:ACM, 2020:632-647.
[26] GOYAL P, AGARWAL A, NETRAVALI R, et al. ABC:A simple explicit congestion controller for wireless networks[C]//17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). Santa Clara, USA:USENIX Association, 2020:353-372.
[27] DU Z, ZHENG J, YU H, et al. A unified congestion control framework for diverse application preferences and network conditions[C]//Proceedings of the 17th International Conference on emerging Networking Experiments and Technologies. New York, USA:ACM, 2021:282-296.
[28] DONG M, MENG T, ZARCHY D, et al. PCC vivace:Online-learning congestion control[C]//15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). Renton, USA:USENIX Assosiation, 2018:343-356.
[29] FLOYD S, JACOBSON V. Random early detection gateways for congestion avoidance[J]. IEEE/ACM Transactions on Networking, 1993, 1(4):397-413.
[30] FLOYD S, GUMMADI R, SHENKER S J. Adaptive RED:An algorithm for increasing the robustness of RED's active queue management[R]. Berkeley, USA:International Computer Science Institute, 2001.
[31] WYDROWSKI B, ZUKERMAN M. GREEN:An active queue management algorithm for a self managed internet[C]//2002 IEEE International Conference on Commu-nications. Conference Proceedings. New York, USA:IEEE, 2002:2368-2372.
[32] NICHOLS K, JACOBSON V. Controlling queue delay[J]. Communications of the ACM, 2012, 55(7):42-50.
[33] PAN R, NATARAJAN P, PIGLIONE C, et al. PIE:A lightweight control scheme to address the bufferbloat problem[C]//2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR). Taipei, China:IEEE, 2013:148-155.
[34] FENG W C, SHIN K G, KANDLUR D D, et al. The BLUE active queue management algorithms[J]. IEEE/ACM Transactions on Networking, 2002, 10(4):513-528.
[35] PALMEI J, GUPTA S, IMPUTATO P, et al. Design and evaluation of COBALT queue discipline[C]//2019 IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN). Paris, France:IEEE, 2019:1-6.
[36] KHADEMI N, ROS D, WELZL M. The new AQM kids on the block:An experimental evaluation of CoDel and PIE[C]//2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). Toronto, Canada:IEEE, 2014:85-90.
[37] MUHAMMAD S, CHAUDHERY T J, NOH Y. Study on performance of AQM schemes over TCP variants in different network environments[J]. IET Communications, 2021, 15(1):93-111.
[38] GRAZIA C A, PATRICIELLO N, KLAPEZ M, et al. A cross-comparison between TCP and AQM algorithms:Which is the best couple for congestion control?[C]//2017 14th International Conference on Telecommunications (ConTEL). Zagreb, Croatia:IEEE, 2017:75-82.
[39] PIOTROWSKA A. On cross-layer interactions for congestion control in the internet[J]. Applied Sciences, 2021, 11(17):7808.
[40] HA S, RHEE I, XU L S. CUBIC:A new TCP-friendly high-speed TCP variant[J]. ACM SIGOPS Operating Systems Review, 2008, 42(5):64-74.
[41] CARLUCCI G, DE CICCO L, MASCOLO S. Controlling queuing delays for real-time communication:The interplay of E2E and AQM algorithms[J]. ACM SIGCOMM Computer Communication Review, 2016, 46(3):1.
[42] ZHANG H, ZHU H T, XIA Y, et al. Performance analysis of BBR congestion control protocol based on NS3[C]//2019 Seventh International Conference on Advanced Cloud and Big Data (CBD). Suzhou, China:IEEE, 2019:363-368.
[43] ABBASLOO S, CHAO H J. Bounding queue delay in cellular networks to support ultra-low latency applications[J/OL]. arXiv.(2019-08-02)[2023-07-15]. https://arxiv.org/pdf/1908.00953.pdf.
[44] DONG M, LI Q X, ZARCHY D, et al. PCC:Re-architecting congestion control for consistent high performance[C]//12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). Oakland, USA:USENIX Association, 2015:395-408.
[45] BACHL M, FABINI J, ZSEBY T. LFQ:Online learning of per-flow queuing policies using deep reinforcement learning[C]//2020 IEEE 45th Conference on Local Computer Networks (LCN). Sydney, Australia:IEEE, 2020:417-420.
[46] NÁDAS S, GOMBOS G, HUDOBA P, et al. Towards a congestion control-independent core-stateless AQM[C]//Proceedings of the Applied Networking Research Workshop. Montreal, Canada:ACM, 2018:84-90.
[47] MA H H, XU D, DAI Y Y, et al. An intelligent scheme for congestion control:When active queue management meets deep reinforcement learning[J]. Computer Networks, 2021, 200:108515.
[48] BAI J N, ZHANG T H, XIE G M. MACC:Cross-layer multi-agent congestion control with deep reinforcement learning[J/OL]. arXiv.(2022-06-04)[2023-07-15]. https://arxiv.org/pdf/2206.01972.pdf.
[49] KATABI D, HANDLEY M, ROHRS C. Congestion control for high bandwidth-delay product networks[J]. ACM SIGCOMM Computer Communication Review, 2002, 32(4):89-102.
[50] TAI C H, ZHU J, DUKKIPATI N. Making large scale deployment of RCP practical for real networks[C]//IEEE INFOCOM 2008-The 27th Conference on Computer Communications. Phoenix, USA:IEEE, 2008:2180-2188.
[51] XIA Y, SUBRAMANIAN L, STOICA I, et al. One more bit is enough[C]//Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. Philadelphia, USA:ACM, 2005:37-48.
文章导航

/