Abstract:Deep learning-based source code defect detection looks at the source code as text data. The defect detection then uses a one-dimensional convolutional network to learn the single spatial characteristics of the code or uses the sequential characteristics of LSTM and BiLSTM which do not take various features of the source code into account. This article uses the multi-channel learning strategy of convolutional neural networks for image classification to identify multi-class source code defects by deep convolutional neural networks. First, a word embedding algorithm such as word2vec or fasttext is used to construct the fusion features with the deep convolutional neural network then used to identify the defect patterns contained in the source code defect data set to form a source code defect classifier. The classifier is then used to recognize defect codes and their corresponding CWE type. The method was evaluated on the SARD dataset and open source software. The results show that this method is superior to existing methods with a model evaluation parameter accuracy of 95.3%, a recall rate of 84.7%, and F1 of 89.7%.
[1] 邹权臣, 张涛, 吴润浦, 等. 从自动化到智能化:软件漏洞挖掘技术进展[J]. 清华大学学报(自然科学版), 2018, 58(12):1079-1094. ZOU Q C, ZHANG T, WU R P, et al. From automation to intelligence:Survey of research on vulnerability discovery techniques[J]. Journal of Tsinghua University (Science & Technology), 2018, 58(12):1079-1094. (in Chinese) [2] 贾凡, 孔令智. 基于卷积神经网络的入侵检测算法[J]. 北京理工大学学报, 2017, 37(12):1271-1275. JIA F, KONG L Z. Intrusion detection algorithm based on convolutional neural network[J]. Transactions of Beijing Institute of Technology, 2017, 37(12):1271-1275. (in Chinese) [3] BIAN P, LIANG B, ZHANG Y, et al. Detecting bugs by discovering expectations and their violations[J]. IEEE Transactions on Software Engineering, 2018, 45(10):984-1001. [4] WANG S, LIU T Y, TAN L. Automatically learning semantic features for defect prediction[C]//Proceedings of 2016 IEEE/ACM 38th International Conference on Software Engineering. Austin, USA:IEEE, 2016:297-308. [5] RUSSELL R, KIM L, HAMILTON L, et al. Automated vulnerability detection in source code using deep representation learning[C]//Proceedings of the 17th IEEE International Conference on Machine Learning and Applications. Orlando, USA:IEEE, 2018:757-762. [6] LI Z, ZOU D Q, XU H, et al. VulDeePecker:A deep learning-based system for vulnerability detection[C]//Proceedings of the Network and Distributed Systems Security Symposium. San Diego, USA:ISOC, 2018. [7] LI Z, ZOU D Q, XU S H, et al. SySeVR:A framework for using deep learning to detect software vulnerabilities[J]. arXiv preprint arXiv:1807.06756, 2018. [8] ZHOU Y Q, LIU S Q, SIOW J, et al. Devign:Effective vulnerability identification by learning comprehensive program semantics via graph neural networks[C]//Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver, Canada, 2019:10197-10207. [9] HARER J A, KIM L Y, RUSSELL R L, et al. Automated software vulnerability detection with machine learning[J]. arXiv preprint arXiv:1803.04497, 2018 [10] DUAN X, WU J Z, JI S L, et al. VulSniper:Focus your attention to shoot fine-grained vulnerabilities[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macoa, China, 2019:4665-4671. [11] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada:Curran Associates Inc., 2013:3111-3119. [12] JOULIN A, GRAVE E, BOJANOWSKI P, et al. Bag of tricks for efficient text classification[C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics:Volume 2. Valencia, Spain:Association for Computational Linguistics, 2017:427-431. [13] KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha, Qatar:Association for Computational Linguistics, 2014:10.3115/v1/D14-1181.