基于代码图像增强的恶意代码检测方法

doi:10.16511/j.cnki.qhdxxb.2020.25.008

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(5121 KB)
输出: BibTeX | EndNote (RIS)

摘要网络空间面临的恶意代码威胁日益严峻，传统恶意代码检测方法在恶意代码攻防对抗中逐渐暴露弊端。针对此现状，该文提出了基于代码灰度化图像增强的恶意代码检测方法，使用恶意代码ASCII字符信息和PE结构信息对传统恶意代码灰度化图像方法进行改进，构建RGB三维图像作为原始数据输入到检测算法，并使用一种带有空间金字塔池化结构的VGG16神经网络模型对恶意代码图像进行训练和预测。该文还提出了一种基于多标注归一化表示的方法来提高样本标签的可靠性，实验结果表明：该方案可以有效应对加壳、混淆等对抗手段，对新型恶意代码具有良好的检测效果。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	孙博文
	张鹏
	成茗宇
	李新童
	李祺

关键词 ：计算机病毒与防治, 恶意代码, 代码图像化, 卷积神经网络, 空间金字塔池化

Abstract：Cyberspace malware is becoming more and more serious with traditional malware detection methods unable to deal with the new types of malware. This paper presents a malware detection method based on enhanced code images. The traditional malware image method is improved by using ASCII character information and PE structure information. A three-dimensional RGB image is used as the raw input into the detection algorithm with a VGG16 neural network model with spatial pyramid pooling used to train and predict the malware images. In addition, a multi-label normalized representation method is used to improve the sample label reliability. The method was evaluated against real malware datasets.

Key words： computer virus and prevention malware code image convolution neural network spatial pyramid pooling

收稿日期: 2019-06-01 出版日期: 2020-04-26

基金资助:李祺,副教授,E-mail:liqi2001@bupt.edu.cn

引用本文:

孙博文, 张鹏, 成茗宇, 李新童, 李祺. 基于代码图像增强的恶意代码检测方法[J]. 清华大学学报（自然科学版）, 2020, 60(5): 386-392.
SUN Bowen, ZHANG Peng, CHENG Mingyu, LI Xintong, LI Qi. Malware detection method based on enhanced code images. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 386-392.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2020.25.008 或 http://jst.tsinghuajournals.com/CN/Y2020/V60/I5/386

[1] AHMADI M, ULYANOV D, SEMENOV S, et al. Novel feature extraction, selection and fusion for effective malware family classification[C]//Proceedings of the 6th ACM Conference on Data and Application Security and Privacy. Orleans, USA:ACM, 2016:183-194.
[2] KOLOSNJAJI B, ZARRAS A, WEBSTER G, et al. Deep learning for classification of malware system call sequences[C]//Proceedings of the 29th Australasian Joint Conference on Artificial Intelligence. Hobart, Australia:Springer, 2016:137-149.
[3] HU W W, TAN Y. Generating adversarial malware examples for black-box attacks based on GAN[J]. arXiv preprint arXiv:1702.05983, 2017.
[4] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. arXiv:1409.1556, 2014.
[5] NATARAJ L, KARTHIKEYAN S, JACOB G, et al. Malware images:Visualization and automatic classification[C]//Proceedings of the 8th International Symposium on Visualization for Cyber Security. Pittsburg, USA:ACM, 2011:4.
[6] 韩晓光, 曲武, 姚宣霞, 等. 基于纹理指纹的恶意代码变种检测方法研究[J]. 通信学报, 2014, 35(8):125-136.HAN X G, QU W, YAO X X, et al. Research on malicious code variants detection based on texture fingerprint[J]. Journal on Communications, 2014, 35(8):125-136. (in Chinese)
[7] 任卓君, 陈光. 熵可视化方法在恶意代码分类中的应用[J].计算机工程, 2017, 43(9):167-171.REN Z J, CHEN G. Application of entropy visualization method in malware classification[J]. Computer Engineering, 2017, 43(9):167-171. (in Chinese)
[8] 张晨斌, 张云春, 郑杨, 等. 基于灰度图纹理指纹的恶意软件分类[J]. 计算机科学, 2018, 45(S1):383-386.ZHANG C B, ZHANG Y C, ZHENG Y, et al. Malware classification based on texture fingerprint of gray-scale images[J]. Computer Science, 2018, 45(S1):383-386. (in Chinese)
[9] CUI Z H, XUE F, CAI X J, et al. Detection of malicious code variants based on deep learning[J]. IEEE Transactions on Industrial Informatics, 2018, 14(7):3187-3196.
[10] REZENDE E, RUPPERT G, CARVALHO T, et al. Malicious software classification using transfer learning of resnet-50 deep neural network[C]//Proceedings of the 16th IEEE International Conference on Machine Learning and Applications (ICMLA). Cancun, Mexico:IEEE, 2017:1011-1014.
[11] PERDISCI R, MANCHON U. VAMO:Towards a fully automated malware clustering validity analysis[C]//Proceedings of the 28th Annual Computer Security Applications Conference. New York, USA:ACM, 2012.
[12] SEBASTIÁN M, RIVERA R, KOTZIAS P, et al. AVCLASS:A tool for massive malware labeling[C]//Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses. Paris, France:Springer, 2016.
[13] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[14] DAHL G E, STOKES J W, DENG L, et al. Large-scale malware classification using random projections and neural networks[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, Canada:IEEE, 2013:3422-3426.
[15] SUN B W, GUO Y H, LI Q, et al. Malware family classification method based on static feature extraction[C]//2017 3rd IEEE International Conference on Computer and Communications (ICCC). Chengdu, China:IEEE, 2017:507-513.

[1]	杨波, 邱雷, 吴书. 异质图神经网络协同过滤模型[J]. 清华大学学报（自然科学版）, 2023, 63(9): 1339-1349.
[2]	陈波, 张华, 陈永灿, 李永龙, 熊劲松. 基于特征增强的水工结构裂缝语义分割方法[J]. 清华大学学报（自然科学版）, 2023, 63(7): 1135-1143.
[3]	杜晓闯, 梁漫春, 黎岢, 俞彦成, 刘欣, 汪向伟, 王汝栋, 张国杰, 付起. 基于卷积神经网络的γ放射性核素识别方法[J]. 清华大学学报（自然科学版）, 2023, 63(6): 980-986.
[4]	邓青, 张博, 李宜豪, 周亮, 周正青, 蒋慧灵, 高扬. 基于级联CNN的疏散场景中人群数量估计模型[J]. 清华大学学报（自然科学版）, 2023, 63(1): 146-152.
[5]	王晓萌, 管志斌, 辛伟, 王嘉捷. 基于深度卷积神经网络的源代码缺陷检测方法[J]. 清华大学学报（自然科学版）, 2021, 61(11): 1267-1272.
[6]	韩坤, 潘海为, 张伟, 边晓菲, 陈春伶, 何舒宁. 基于多模态医学图像的Alzheimer病分类方法[J]. 清华大学学报（自然科学版）, 2020, 60(8): 664-671,682.
[7]	林鹏, 魏鹏程, 樊启祥, 陈闻起. 基于CNN模型的施工现场典型安全隐患数据学习[J]. 清华大学学报（自然科学版）, 2019, 59(8): 628-634.
[8]	梁杰, 陈嘉豪, 张雪芹, 周悦, 林家骏. 基于独热编码和卷积神经网络的异常检测[J]. 清华大学学报（自然科学版）, 2019, 59(7): 523-529.
[9]	张思聪, 谢晓尧, 徐洋. 基于dCNN的入侵检测方法[J]. 清华大学学报（自然科学版）, 2019, 59(1): 44-52.
[10]	刘亚姝, 王志海, 侯跃然, 严寒冰. 信息密度增强的恶意代码可视化与自动分类方法[J]. 清华大学学报（自然科学版）, 2019, 59(1): 9-14.
[11]	刘琼, 李宗贤, 孙富春, 田永鸿, 曾炜. 基于深度信念卷积神经网络的图像识别与分类[J]. 清华大学学报（自然科学版）, 2018, 58(9): 781-787.
[12]	芦效峰, 蒋方朔, 周箫, 崔宝江, 伊胜伟, 沙晶. 基于API序列特征和统计特征组合的恶意样本检测框架[J]. 清华大学学报（自然科学版）, 2018, 58(5): 500-508.
[13]	芦效峰, 张胜飞, 伊胜伟. 基于CNN和RNN的自由文本击键模式持续身份认证[J]. 清华大学学报（自然科学版）, 2018, 58(12): 1072-1078.
[14]	李晓飞, 许庆, 熊辉, 王建强, 李克强. 基于候选区域选择及深度网络模型的骑车人识别[J]. 清华大学学报（自然科学版）, 2017, 57(5): 491-496.

Viewed

Full text

Abstract

Cited

Shared

Discussed