Control flow graph division based on an improved GN algorithm
MA Rui, GAO Haoran, DOU Bowen, WANG Xiajing, HU Changzhen
Beijing Key Laboratory of Software Security Engineering Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
Abstract:The accuracy and efficiency of program analyses are hindered by very large control flow graphs (CFG). This paper presents an improved GN (Girvan-Newman) algorithm for CFG division. The node weights are added as parameters to the betweenness calculation to better balance the subgraph sizes with the sizes controlled dynamically to terminate the algorithm at a suitable time to improve the execution efficiency. Then, the binary programs indicated by the CFGs are analyzed using the angr tool. The improved GN algorithm, K-means algorithm, spectral clustering algorithm and naive aggregation algorithm were all tested with the results showing the improved GN algorithm provided the best modularity and subgraph size balance.
马锐, 高浩然, 窦伯文, 王夏菁, 胡昌振. 基于改进GN算法的程序控制流图划分方法[J]. 清华大学学报(自然科学版), 2019, 59(1): 15-22.
MA Rui, GAO Haoran, DOU Bowen, WANG Xiajing, HU Changzhen. Control flow graph division based on an improved GN algorithm. Journal of Tsinghua University(Science and Technology), 2019, 59(1): 15-22.
[1] CASELDEN D, BAZHANYUK A, PAYER M, et al. Transformation-aware exploit generation using a HI-CFG:UCB/EECS-2013-85[R]. Berkeley, USA:University of California, 2013. [2] QIU J, SU X H, MA P J. Identifying functions in binary code with reverse extended control flow graphs[J]. Journal of Software:Evolution and Process, 2015, 27(10):793-820. [3] NGUYEN M H, NGUYEN D L, NGUYEN X M, et al. Auto-detection of sophisticated malware using lazy-binding control flow graph and deep learning[J]. Computers & Security, 2018, 76:128-155. [4] WEI L. Segmented symbolic analysis[C]//Proceedings of the 35th International Conference on Software Engineering. San Francisco, USA:IEEE, 2013:212-221. [5] 颜婷. 分段式分析方法在动态符号执行中的应用[D]. 上海:华东师范大学, 2015. YAN T. Dynamic symbolic execution with segmented analysis[D]. Shanghai:East China Normal University, 2015. (in Chinese) [6] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496. [7] KARGER D R. Global min-cuts in RNC, and other ramifications of a simple min-out algorithm[C]//Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms. Austin, USA:ACM, 1993:21-30. [8] BOYKOV Y, KOLMOGOROV V. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(9):1124-1137. [9] YU S X, SHI J B. Multiclass spectral clustering[C]//Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France:IEEE, 2003:313. [10] GREGORY S. Finding overlapping communities in networks by label propagation[J]. New Journal of Physics, 2009, 12(10):103018. [11] SHOSHITAISHVILI Y, WANG R Y, SALLS C, et al. SOK:(state of) the art of war:Offensive techniques in binary analysis[C]//Proceedings of the 37th IEEE Symposium on Security and Privacy. San Jose, USA:IEEE, 2016:138-157. [12] NEWMAN M E J, GIRVAN M. Finding and evaluating community structure in networks[J]. Physical Review E, 2004, 69(2):026113. [13] NEWMAN M E. Fast algorithm for detecting community structure in networks[J]. Physical Review E, 2004, 69(6 Pt 2), 066133.