针对基础K-means算法在KDD 99数据集中检测罕见攻击效果差且效率低下等问题,该文通过数据统计的方式对数据集中各维度与每类攻击类型的相关分析发现,罕见攻击极易被大量的常见攻击所淹没,而当常见攻击被移去时,这些威胁性更大的罕见攻击则能够被更好地识别出来。基于此,该文提出一种改进的基于K-means分层迭代的检测算法,通过有针对性的特征选择来降低K-means聚类的数据维度,经过多次属性消减的K-means聚类迭代操作可以更加精准地检测到不同异常类型的攻击。在KDD 99数据集上的实验结果表明:该算法对原基础的K-means检测算法难以检测到的罕见攻击类型U2R/R2L攻击检测率几乎达到99%左右。同时随着每次分层迭代聚类维度近50%的降低,进一步节省了约90%的异常检测时间。
Abstract
Although the basic K-means test was used for anomaly detection in the KDD 99 attack dataset, its accuracy and efficiency for detecting rare attacks needs to be improved. Rare attacks, which are usually greater threats, are easily hidden by common threats so the rare attacks can be more easily identified by removing common attacks. An improved hierarchical iterative K-means method was developed based on this finding to detect all kinds of anomalies using feature reduction through correlations to decrease classification the dimensions. The algorithm is able to detect almost every rare attack with a 99% succesful classification rate and for nearly real-time detection with 90% less computations on the KDD 99 data compared with the basic K-means algorithm.
关键词
异常检测 /
K-means /
特征消减 /
U2R /
R2L
Key words
anomaly detection /
K-means /
feature reduction /
U2R /
R2L
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] NI X J, HE D J, FAROOQ A. Practical network anomaly detection using data mining techniques[J]. VFAST Transactions on Software Engineering, 2016, 9(2):1-6.[2] TROST R. Practical intrusion analysis:Prevention and detection for the twenty-first century[M]. New York:Addison-Wesley, 2009.[3] BHUYAN M H, BHATTACHARYYA D K, KALITA J K. Network anomaly detection:Methods, systems and tools[J]. IEEE Communications Surveys & Tutorials, 2014, 16(1):303-336.[4] KNORR E M, NG R T. Algorithms for mining distance-based outliers in large datasets[C]//Proceedings of the 24th International Conference on Very Large Data Bases. New York, USA:Morgan Kaufmann, 1998:392-403.[5] WEI L, QIAN W N, ZHOU A Y, et al. Hot:Hypergraph-based outlier test for categorical data[C]//Proceedings of the 7th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. Seoul, Korea:Springer, 2003:399-410.[6] BAY S D, SCHWABACHER M. Mining distance-based outliers in near linear time with randomization and a simple pruning rule[C]//Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA:ACM Press, 2003:29-38.[7] BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF:Identifying density-based local outliers[J]. ACM SIGMOD Record, 2000, 29(2):93-104.[8] 季成, 李晓东, 袁坚, 等. 基于<em>K</em>-means算法的DNS查询模式分析[J]. 清华大学学报(自然科学版), 2010, 50(4):601-604.JI C, LI X D, YUAN J, et al. Analysis of domain name queries based on the <em>K</em>-means algorithm[J]. Journal of Tsinghua University (Science and Technology), 2010, 50(4):601-604. (in Chinese)[9] KDD Cup 1999 Intrusion detection dataset[EB/OL]. (1999-10-28). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.[10] 蒋学英, 李雅珍, 严结苟. 基于SOM神经网络的异常检测算法研究[J]. 计算机科学, 2008, 35(10B):244-246. JIANG X Y, LI Y Z, YAN J G. Research on anomaly detection algorithm based on SOM neural network[J]. Computer Science, 2008, 35(10B):244-246. (in Chinese)[11] MOUSTAFA N, SLAY J. The evaluation of network anomaly detection systems:Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD 99 data set[J]. Information Security Journal:A Global Perspective, 2016, 25(1-3):18-31.[12] WELLER-FAHY D J, BORGHETTI B J, SODEMANN A A. A survey of distance and similarity measures used within network intrusion anomaly detection[J]. IEEE Communications Surveys & Tutorials, 2014, 17(1):70-91.[13] 傅涛, 孙文静, 孙亚民. 基于分箱统计的FCM算法及其在网络入侵检测中的应用[J]. 计算机科学, 2008, 35(4):36-39.FU T, SUN W J, SUN Y M. FCM algorithm based on Box-FCM statistics and its application in network intrusion detection[J]. Computer Science, 2008, 35(4):36-39. (in Chinese)[14] SYARIF I, PRUGEL-BENNETT A, WILLS G. Unsupervised clustering approach for network anomaly detection[C]//International Conference on Networked Digital Technologies (NDT 2012). Berlin, Germany:Springer, 2012:135-145.