针对控制流图规模过大导致的程序分析准确度和效率不够理想的问题,该文提出了一种用于控制流图划分的改进GN(Girvan-Newman)算法,在边介数计算中加入点权值作为参数,使划分所得各子图的规模更加平衡;通过动态控制子图的规模,在合适的时机提前终止算法执行,提高执行效率。利用angr工具对二进制程序进行分析所得到的控制流图,分别采用改进GN算法、K-means算法、谱聚类算法和朴素凝聚算法进行实验,比较不同算法对控制流图划分结果中的模块度以及均衡性等指标,证明改进GN算法具有最佳的划分结果和执行效率。
Abstract
The accuracy and efficiency of program analyses are hindered by very large control flow graphs (CFG). This paper presents an improved GN (Girvan-Newman) algorithm for CFG division. The node weights are added as parameters to the betweenness calculation to better balance the subgraph sizes with the sizes controlled dynamically to terminate the algorithm at a suitable time to improve the execution efficiency. Then, the binary programs indicated by the CFGs are analyzed using the angr tool. The improved GN algorithm, K-means algorithm, spectral clustering algorithm and naive aggregation algorithm were all tested with the results showing the improved GN algorithm provided the best modularity and subgraph size balance.
关键词
程序分析 /
控制流图划分 /
聚类 /
GN算法
Key words
program analysis /
control flow graph division /
clustering /
GN algorithm
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] CASELDEN D, BAZHANYUK A, PAYER M, et al. Transformation-aware exploit generation using a HI-CFG:UCB/EECS-2013-85[R]. Berkeley, USA:University of California, 2013.
[2] QIU J, SU X H, MA P J. Identifying functions in binary code with reverse extended control flow graphs[J]. Journal of Software:Evolution and Process, 2015, 27(10):793-820.
[3] NGUYEN M H, NGUYEN D L, NGUYEN X M, et al. Auto-detection of sophisticated malware using lazy-binding control flow graph and deep learning[J]. Computers & Security, 2018, 76:128-155.
[4] WEI L. Segmented symbolic analysis[C]//Proceedings of the 35th International Conference on Software Engineering. San Francisco, USA:IEEE, 2013:212-221.
[5] 颜婷. 分段式分析方法在动态符号执行中的应用[D]. 上海:华东师范大学, 2015. YAN T. Dynamic symbolic execution with segmented analysis[D]. Shanghai:East China Normal University, 2015. (in Chinese)
[6] RODRIGUEZ A, LAIO A. Clustering by fast search and find of density peaks[J]. Science, 2014, 344(6191):1492-1496.
[7] KARGER D R. Global min-cuts in RNC, and other ramifications of a simple min-out algorithm[C]//Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms. Austin, USA:ACM, 1993:21-30.
[8] BOYKOV Y, KOLMOGOROV V. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(9):1124-1137.
[9] YU S X, SHI J B. Multiclass spectral clustering[C]//Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France:IEEE, 2003:313.
[10] GREGORY S. Finding overlapping communities in networks by label propagation[J]. New Journal of Physics, 2009, 12(10):103018.
[11] SHOSHITAISHVILI Y, WANG R Y, SALLS C, et al. SOK:(state of) the art of war:Offensive techniques in binary analysis[C]//Proceedings of the 37th IEEE Symposium on Security and Privacy. San Jose, USA:IEEE, 2016:138-157.
[12] NEWMAN M E J, GIRVAN M. Finding and evaluating community structure in networks[J]. Physical Review E, 2004, 69(2):026113.
[13] NEWMAN M E. Fast algorithm for detecting community structure in networks[J]. Physical Review E, 2004, 69(6 Pt 2), 066133.
基金
国家重点研发计划项目(2016QY07X1404)