基于离线汇编指令流分析的恶意程序算法识别技术

doi:10.16511/j.cnki.qhdxxb.2016.25.005

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1148 KB)
输出: BibTeX | EndNote (RIS)

摘要识别二进制程序中的算法, 在恶意程序检测、软件分析、网络传输分析、计算机系统安全保护等领域有着广泛的应用和重要的意义。该文提出基于离线汇编指令流分析的恶意代码算法识别技术, 综合运用二进制插桩、污点跟踪、循环识别等技术, 从行为语义、关键常数2个维度对程序进行描述, 并且分析提取特征。算法识别模型使用机器学习算法, 针对双维度特征生成初阶识别模型, 并通过模型融合优化识别效果, 实现对广义程序算法的高准确率识别。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	赵晶玲
	陈石磊
	曹梦晨
	崔宝江

关键词 ：算法识别, 污点跟踪, 机器学习, 恶意程序检测

Abstract：Binary program algorithm identification is widely used for malware detection, software analyse, network encryption analyse and computer system protection. This paper describes a malware algorithm recognition method using offline instruction-flow analyses using binary instrumentation, taint traces, and loop recognition. The algorithm features are described including the behavior semantics and key constants extracted from the instruction-flow algorithm. Two machine learning models trained by these features are merged into one accurate recognition algorithm.

Key words： algorithm recognition taint trace machine learning malware detection

收稿日期: 2016-01-24 出版日期: 2016-05-15

ZTFLH:

TP301.6

引用本文:

赵晶玲, 陈石磊, 曹梦晨, 崔宝江. 基于离线汇编指令流分析的恶意程序算法识别技术[J]. 清华大学学报（自然科学版）, 2016, 56(5): 484-492.
ZHAO Jingling, CHEN Shilei, CAO Mengchen, CUI Baojiang. Malware algorithm recognition based on offline instruction-flow analyse. Journal of Tsinghua University(Science and Technology), 2016, 56(5): 484-492.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.25.005 或 http://jst.tsinghuajournals.com/CN/Y2016/V56/I5/484

图１　离线分析框架流程

图２　循环结构控制流示意

表１　循环嵌套后向搜索算法

图３　算法识别模型融合过程示意

表２　行为语义轮廓初阶识别模型结果

表３　关键常数初阶识别模型结果

表４　仲裁模型的函数算法功能识别结果

图４　模型测试结果对比

[1] Vyacheslav Zakorzhevsk. 卡巴斯基实验室每天检测到32.5万个最新恶意文件[Z/OL].[2014-12-03] . http://news.kaspersky.com.cn/news2014/12n/141203.htm. Vyacheslav Zakorzhevsk. 325, 000 new malicious files detected by Kabasiji labs every day[Z/OL].[2014-12-03] . http://news.kaspersky.com.cn/news2014/12n/141203.htm. (in Chinese)
[2] Calvet J, Fernandez J M, Marion J Y. Aligot:Cryptographic function identification in obfuscated binary programs[C]//Proceedings of the 2012 ACM Conference on Computer and Communications Security. New York, USA:ACM, 2012:169-182.
[3] Leder F, Martini P, Wichmann A. Finding and extracting crypto routines from malware[C]//Performance Computing and Communications Conference (IPCCC), 2009 IEEE 28th International. Piscataway, NJ:IEEE Press, 2009:394-401.
[4] Cui B, Wang F, HaoY, et al. A taint based approach for automatic reverse engineering of gray-box file formats[J].Soft Computing, 2015:1-16.
[5] Wang Z, Jiang X, Cui W, et al. ReFormat:Automatic reverse engineering of encrypted messages[C]//Proceedings of the 14th European Conference on Research in Computer Security. Berlin, GER:Springer-Verlag, 2008:200-215.
[6] Lutz N. Towards revealing attackers intent by automatically decrypting network traffic[J]. Eth Zuerich, 2008(8):1-52.
[7] 李继中, 蒋烈辉, 舒辉, 等. 基于动态数据流的密码函数加解密过程分析[J]. 计算机应用研究, 2014,31(4):1185-1188. LI Jizhong, JIANG Liehui, SHU Hui, et al. Analysis of encryption and decryption process among crypto functions based on dynamic data-flow[J].Application Research of Computer, 2014,31(4):1185-1188. (in Chinese)
[8] Gr bert F, Willems C, Holz T. Automated identification of cryptographic primitives in binary programs[J].Lecture Notes in Computer Science, 2011,6961:41-60.
[9] 张经纬, 舒辉, 蒋烈辉, 等. 公钥密码算法识别技术研究[J]. 计算机工程与设计, 2011,32(10):3243-3246. ZHANG Jingwei, SHU Hui, JIANG Liehui, et al. Research on public key's cryptography algorithm recognition technology[J].Computer Engineering and Desgin, 2011,32(10):3243-3246. (in Chinese)
[10] 李洋, 康绯, 舒辉. 基于动态二进制分析的密码算法识别[J]. 计算机工程, 2012, 38(17):106-109. LI Yang, KANG Fei, SHU Hui. Cryptographic algorithm recognition based on dynamic binary analysis[J].Computer Engineering, 2012,38(17):106-109. (in Chinese)
[11] Caballero J, Yin H, Liang Z, et al. Polyglot:Automatic extraction of protocol message format using dynamic binary analysis[C]//Proceedings of the 14th ACM Conference on Computer and Communications Security. New York, USA:ACM, 2007:317-329.
[12] Cui B, Wang F, Guo T, et al. A practical off-line taint analysis framework and its application in reverse engineering of file format[J].Computers & Security, 2015,51:1-15.
[13] 王乾. 基于动态二进制分析的关键函数定位技术研究[D]. 郑州:解放军信息工程大学, 2012. WANG Qian. Research on Locating of Key Functions Based on Dynamic Binary Analysis[D]. Zhengzhou:The PLA Information Engineering University, 2012. (in Chinese)
[14] 黎超. 基于切片的二进制代码可视化分析的研究[D]. 广州:广东工业大学, 2011 LI Chao. Research on Slicing-based Binary Executables Analysis Technology[D]. Guangzhou:Guangdong University of Technology, 2012. (in Chinese)
[15] 李雪莲. 基于PLS的加权朴素贝叶斯分类测试算法[J]. 电子质量, 2010(7):4-6. LI Xuelian. Weighted naive Bayes classification text algorithm based on partial least squares[J].Electronics Quality, 2010(7):4-6. (in Chinese)

[1]	吴浩, 牛风雷. 高温球床辐射传热中的机器学习模型[J]. 清华大学学报（自然科学版）, 2023, 63(8): 1213-1218.
[2]	代鑫, 黄弘, 汲欣愉, 王巍. 基于机器学习的城市暴雨内涝时空快速预测模型[J]. 清华大学学报（自然科学版）, 2023, 63(6): 865-873.
[3]	任建强, 崔亚鹏, 倪顺江. 基于机器学习的新冠肺炎疫情趋势预测方法[J]. 清华大学学报（自然科学版）, 2023, 63(6): 1003-1011.
[4]	安健, 陈宇轩, 苏星宇, 周华, 任祝寅. 机器学习在湍流燃烧及发动机中的应用与展望[J]. 清华大学学报（自然科学版）, 2023, 63(4): 462-472.
[5]	赵祺铭, 毕可鑫, 邱彤. 基于机器学习的乙烯裂解过程模型比较与集成[J]. 清华大学学报（自然科学版）, 2022, 62(9): 1450-1457.
[6]	曹来成, 李运涛, 吴蓉, 郭显, 冯涛. 多密钥隐私保护决策树评估方案[J]. 清华大学学报（自然科学版）, 2022, 62(5): 862-870.
[7]	王豪杰, 马子轩, 郑立言, 王元炜, 王飞, 翟季冬. 面向新一代神威超级计算机的高效内存分配器[J]. 清华大学学报（自然科学版）, 2022, 62(5): 943-951.
[8]	陆思聪, 李春文. 基于场景与话题的聊天型人机会话系统[J]. 清华大学学报（自然科学版）, 2022, 62(5): 952-958.
[9]	李维, 李城龙, 杨家海. As-Stream：一种针对波动数据流的算子智能并行化策略[J]. 清华大学学报（自然科学版）, 2022, 62(12): 1851-1863.
[10]	刘强墨, 何旭, 周佰顺, 吴昊霖, 张弛, 秦羽, 沈晓梅, 高小榕. 基于机器学习和瞳孔响应的简易高性能自闭症分类模型[J]. 清华大学学报（自然科学版）, 2022, 62(10): 1730-1738.
[11]	马晓悦, 孟啸. 用户参与视角下多图推文的图像位置和布局效应[J]. 清华大学学报（自然科学版）, 2022, 62(1): 77-87.
[12]	汤志立, 王雪, 徐千军. 基于过采样和客观赋权法的岩爆预测[J]. 清华大学学报（自然科学版）, 2021, 61(6): 543-555.
[13]	王志国, 章毓晋. 监控视频异常检测：综述[J]. 清华大学学报（自然科学版）, 2020, 60(6): 518-529.
[14]	宋宇波, 祁欣妤, 黄强, 胡爱群, 杨俊杰. 基于二阶段多分类的物联网设备识别算法[J]. 清华大学学报（自然科学版）, 2020, 60(5): 365-370.
[15]	芦效峰, 蒋方朔, 周箫, 崔宝江, 伊胜伟, 沙晶. 基于API序列特征和统计特征组合的恶意样本检测框架[J]. 清华大学学报（自然科学版）, 2018, 58(5): 500-508.

Viewed

Full text

Abstract

Cited

Shared

Discussed