Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2021, Vol. 61 Issue (11): 1267-1272    DOI: 10.16511/j.cnki.qhdxxb.2020.26.042
  漏洞分析与风险评估 本期目录 | 过刊浏览 | 高级检索 |
基于深度卷积神经网络的源代码缺陷检测方法
王晓萌, 管志斌, 辛伟, 王嘉捷
中国信息安全测评中心, 北京 100085
Source code defect detection using deep convolutional neural networks
WANG Xiaomeng, GUAN Zhibin, XIN Wei, WANG Jiajie
China Information Technology Security Evaluation Center, Beijing 100085, China
全文: PDF(1152 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 基于深度神经网络的源代码缺陷检测方法通常将源代码作为文本数据,采用卷积网络学习代码的单一空间特征,或者利用LSTM、BiLSTM源代码样本的时序特征,并未在源代码数据的多特征融合方面进行深入研究。为探索验证源代码的多种特征在缺陷检测方面的应用效果,该文基于卷积神经网络在图像领域的多通道学习策略,融合word2vec、fasttext等词嵌套技术的词向量表达,创建源代码的综合向量表征;利用深度卷积神经网络学习源代码缺陷数据中蕴含的缺陷模式,形成源代码缺陷分类器,实现多类代码缺陷检测。将该方法与已有的单通道神经网络源代码缺陷检测方法通过SARD数据集和开源软件源代码进行验证,结果表明:该方法在精确度、召回率、F1等方面测试平均结果分别为95.3%、84.7%、89.7%,与已有方法相比,有不同幅度的提升。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王晓萌
管志斌
辛伟
王嘉捷
关键词 深度卷积神经网络特征融合多分类源代码缺陷检测    
Abstract:Deep learning-based source code defect detection looks at the source code as text data. The defect detection then uses a one-dimensional convolutional network to learn the single spatial characteristics of the code or uses the sequential characteristics of LSTM and BiLSTM which do not take various features of the source code into account. This article uses the multi-channel learning strategy of convolutional neural networks for image classification to identify multi-class source code defects by deep convolutional neural networks. First, a word embedding algorithm such as word2vec or fasttext is used to construct the fusion features with the deep convolutional neural network then used to identify the defect patterns contained in the source code defect data set to form a source code defect classifier. The classifier is then used to recognize defect codes and their corresponding CWE type. The method was evaluated on the SARD dataset and open source software. The results show that this method is superior to existing methods with a model evaluation parameter accuracy of 95.3%, a recall rate of 84.7%, and F1 of 89.7%.
Key wordsdeep convolutional neural network    feature fusion    multi-classification    source code    defect detection
收稿日期: 2020-11-24      出版日期: 2021-10-19
基金资助:国家自然科学基金资助项目(U1836209,U1736110,U1936211,U1936101,U1836113)
引用本文:   
王晓萌, 管志斌, 辛伟, 王嘉捷. 基于深度卷积神经网络的源代码缺陷检测方法[J]. 清华大学学报(自然科学版), 2021, 61(11): 1267-1272.
WANG Xiaomeng, GUAN Zhibin, XIN Wei, WANG Jiajie. Source code defect detection using deep convolutional neural networks. Journal of Tsinghua University(Science and Technology), 2021, 61(11): 1267-1272.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2020.26.042  或          http://jst.tsinghuajournals.com/CN/Y2021/V61/I11/1267
  
  
  
  
  
[1] 邹权臣, 张涛, 吴润浦, 等. 从自动化到智能化:软件漏洞挖掘技术进展[J]. 清华大学学报(自然科学版), 2018, 58(12):1079-1094. ZOU Q C, ZHANG T, WU R P, et al. From automation to intelligence:Survey of research on vulnerability discovery techniques[J]. Journal of Tsinghua University (Science & Technology), 2018, 58(12):1079-1094. (in Chinese)
[2] 贾凡, 孔令智. 基于卷积神经网络的入侵检测算法[J]. 北京理工大学学报, 2017, 37(12):1271-1275. JIA F, KONG L Z. Intrusion detection algorithm based on convolutional neural network[J]. Transactions of Beijing Institute of Technology, 2017, 37(12):1271-1275. (in Chinese)
[3] BIAN P, LIANG B, ZHANG Y, et al. Detecting bugs by discovering expectations and their violations[J]. IEEE Transactions on Software Engineering, 2018, 45(10):984-1001.
[4] WANG S, LIU T Y, TAN L. Automatically learning semantic features for defect prediction[C]//Proceedings of 2016 IEEE/ACM 38th International Conference on Software Engineering. Austin, USA:IEEE, 2016:297-308.
[5] RUSSELL R, KIM L, HAMILTON L, et al. Automated vulnerability detection in source code using deep representation learning[C]//Proceedings of the 17th IEEE International Conference on Machine Learning and Applications. Orlando, USA:IEEE, 2018:757-762.
[6] LI Z, ZOU D Q, XU H, et al. VulDeePecker:A deep learning-based system for vulnerability detection[C]//Proceedings of the Network and Distributed Systems Security Symposium. San Diego, USA:ISOC, 2018.
[7] LI Z, ZOU D Q, XU S H, et al. SySeVR:A framework for using deep learning to detect software vulnerabilities[J]. arXiv preprint arXiv:1807.06756, 2018.
[8] ZHOU Y Q, LIU S Q, SIOW J, et al. Devign:Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver, Canada, 2019:10197-10207.
[9] HARER J A, KIM L Y, RUSSELL R L, et al. Automated software vulnerability detection with machine learning[J]. arXiv preprint arXiv:1803.04497, 2018
[10] DUAN X, WU J Z, JI S L, et al. VulSniper:Focus your attention to shoot fine-grained vulnerabilities[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macoa, China, 2019:4665-4671.
[11] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada:Curran Associates Inc., 2013:3111-3119.
[12] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics:Volume 2. Valencia, Spain:Association for Computational Linguistics, 2017:427-431.
[13] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar:Association for Computational Linguistics, 2014:10.3115/v1/D14-1181.
[1] 徐鹏飞, 陈梅雅, 开艳, 王子鹏, 李新宇, 万刚, 王延杰. 大型水电站坝体检测水下机器人研究进展[J]. 清华大学学报(自然科学版), 2023, 63(7): 1032-1040.
[2] 周恺, 张睿哲, 叶宽, 李鸿达, 王哲, 黄松岭. 基于同步压缩小波变换的接地扁钢缺陷电磁超声SH导波检测方法[J]. 清华大学学报(自然科学版), 2022, 62(12): 2013-2020.
[3] 刘树栋, 张嘉妮, 陈旭. 评论感知的异构变分自编码器推荐模型[J]. 清华大学学报(自然科学版), 2022, 62(1): 88-97.
[4] 刘仁杰, 孙跃文, 刘锡明, 苗积臣, 周立业, 丛鹏. 基于螺旋CT的高温气冷堆石墨构件及碳砖缺陷检测方法[J]. 清华大学学报(自然科学版), 2021, 61(4): 367-376.
[5] 管志斌, 王晓萌, 辛伟, 王嘉捷. 源代码缺陷检测数据生成及标注方法[J]. 清华大学学报(自然科学版), 2021, 61(11): 1240-1245.
[6] 宋宇波, 祁欣妤, 黄强, 胡爱群, 杨俊杰. 基于二阶段多分类的物联网设备识别算法[J]. 清华大学学报(自然科学版), 2020, 60(5): 365-370.
[7] 许福, 杨湛宇, 陈志泊, 孙钰, 张海燕. 开源代码仓库增量分析方法[J]. 清华大学学报(自然科学版), 2018, 58(7): 630-638.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn