Abstract:Uncertainties make it impossible to cluster uncertain data streams using traditional clustering algorithms. This paper presents a density-based clustering algorithms for uncertain data stream environments. An uncertainty metric is used to measure the distribution information in the uncertain data. The uncertain data streams DENCLUE algorithm (USDENCLUE) is then modified to deal with uncertainty data to minimize the impact of the data uncertainty on the clustering results. A density-based clustering algorithm is then given for uncertain data streams with a sliding window to rapidly prune the clusters using an exponential histogram of the cluster features. This algorithm can efficiently handle noisy data in evolving data streams to generate arbitrary clusters to improve the clustering quality. Comparisons of this algorithm with the CluStream clustering algorithm on real and synthetic data sets show the efficiency and effectiveness of this algorithm.
韩东红, 宋明, 张宏亮, 王佳茜, 王嘉兴, 王国仁. 基于密度的不确定数据流聚类算法[J]. 清华大学学报(自然科学版), 2017, 57(8): 884-891.
HAN Donghong, SONG Ming, ZHANG Hongliang, WANG Jiaxi, WANG Jiaxing, WANG Guoren. Algorithm for clustering uncertain data streams based on density. Journal of Tsinghua University(Science and Technology), 2017, 57(8): 884-891.
Deshpande A, Guestrin C, Madden S, et al. Model-driven data acquisition in sensor networks[C]//Proceeding of the 30th International Conference on Very Large Data Bases. New York, USA:ACM Press, 2004:588-599.
[2]
GU Yu, YU Ge, ZHANG Tiancheng. RFID complex event processing techniques[J]. Journal of Frontiers of Computer Science and Technology, 2007, 1(3):255-267.
[3]
Jeffery S R, Garofalakis M N, Frwanklin M J. Adaptive cleaning for RFID data streams[C]//Proceeding of the 32nd International Conference on Very Large Data Bases. New York, USA:ACM Press, 2006:163-174.
[4]
ZHOU Aoying, JIN Cheqing, WANG Guoren, et al. A survey on the management of uncertain data[J]. Chinese Journal of Computers, 2009, 32(1):1-16.
[5]
Aggarwal C C, Han J, Wang J, et al. A framework for clustering evolving data streams[C]//Proceeding of the 29th International Conference on Very Large Data Bases. New York, USA:ACM Press, 2003:81-92.
[6]
Aggarwal C C, Yu P S. A framework for clustering uncertain data streams[C]//Proceeding of the 24th International Conference on Data Engineering. Cancún, México, 2008:150-159.
[7]
Aggarwal C C. On high dimensional projected clustering of uncertain data streams[C]//Proceeding of the 25th International Conference on Data Engineering. Shanghai, 2009:1152-1154.
[8]
Huang G Y, Liang D P, Ren J D, et al. An algorithm for clustering uncertain data streams over sliding windows[C]//Proceeding of the 6th International Conference on Digital Content, Multimedia Technology and Its Applications. Seoul, Korea, 2009:173-177.
[9]
ZHANG Chen, JIN Cheqing, ZHOU Aoying. Clustering algorithm over uncertain data streams[J]. Journal of Software, 2010, 21(9):2173-2182.
[10]
CAO Keyan, WANG Guoren, HAN Donghong, et al. A framework for high-quality clustering uncertain data stream over sliding windows[C]//Proceeding of the 13th International Conference on Web-Age Information Management. Harbin, 2012:308-313.
[11]
YANG Yue, LIU Zhuo, ZHANG Jianpei, et al. Dynamic density-based clustering algorithm over uncertain data streams[C]//Proceeding of the 20129th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD). Chongqing, 2012:2664-2670.
[12]
JIN Cheqing, Yu J X, ZHOU Aoying, et al. Efficient clustering of uncertain data streams[J]. Knowledge and Information Systems, 2014, 40(3):509-539.
[13]
Dallachiesa M, Jacques-Silva G, Gedik B, et al. Sliding windows over uncertain data streams[J]. Knowledge and Information Systems, 2015, 45(1):159-190.
[14]
Hinneburg A, Keim D A. An efficient approach to clustering in large multimedia databases with noise[C]//Proceeding of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA, 2010:58-65.