COMPUTER SCIENCE AND TECHNOLOGY |
|
|
|
|
|
Mining of constant conditional functional dependencies based on pruning free itemsets |
ZHOU Jinling, DIAO Xingchun, CAO Jianjun |
PLA University of Science and Technology, Nanjing 210007, China |
|
|
Abstract The search space for discovering constant conditional functional dependencies (CCFDs) is reduced and the efficiency is improved by a series of pruning strategies that optimize the algorithm CFDMiner, which is a popular algorithm for mining CCFDs. Theoretical studies show many invalid and redundant free and closed itemsets for outputting valid CCFDs. Thus, pruning of free itemsets and selecting of corresponding closed itemsets can generate as consistent results as the original algorithm. Tests show that the optimized algorithm has a smaller search space and its efficiency is improved 4~5 fold on true data.
|
Keywords
conditional functional dependency
functional dependency
free itemset
closed itemset
pruning algorithm
|
|
Issue Date: 15 March 2016
|
|
|
[1] Fei C, Miller R J. Discovering data quality rules[C]//Proceedings of 34th International Conference on Very Large Data Bases. Auckland, New Zealand:VLDB Endowment, ACM, 2008:1166-1177.
[2] Diallo T, Novelli N, Petit J M. Discovering (frequent) constant conditional functional dependencies[J]. International Journal of Data Mining, Modelling and Management, 2012, 4(3):205-223.
[3] Fan W, Geerts F, Jia X, et al. Conditional functional dependencies for capturing data inconsistencies[J]. ACM Transactions on Database Systems, 2008, 33(2):1-48.
[4] Fan W. Dependencies revisited for improving data quality[C]//Proceedings of 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008. Vancouver, BC, Canada:ACM, 2008:159-170.
[5] 刘波, 耿寅荣. 数据质量检测规则挖掘方法[J]. 模式识别与人工智能, 2012, 25(5):835-844. LIU Bo, GENG Yinrong. Mining method for data quality detection rules[J]. Pattern Recognition and Artificial Intelligence, 2012, 25(5):835-844. (in Chinese)
[6] Golab L, Karloff H, Korn F, et al. On generating near-optimal tableaux for conditional functional dependencies[C]//Proceedings of 34th International Conference on Very Large Data Bases. Auckland, New Zealand:VLDB Endowment, ACM, 2008:376-390.
[7] Fan W, Geerts F, Lakshmanan L V S, et al. Discovering conditional functional dependencies[C]//Proceedings of the 25th International Conference on Data Engineering (ICDE). Shanghai, China:IEEE, 2009:1231-1234.
[8] Fan W, Geerts F, Li J, et al. Discovering conditional functional dependencies[J]. IEEE Transactions on Knowledge & Data Engineering, 2011, 23(5):683-698.
[9] Fan W, Geerts F. Foundations of data quality management[M]. San Rafael, CA, USA. Morgan & Claypool, 2012.
[10] Li H, Li J, Wong L, et al. Relative risk and odds ratio:A data mining perspective[C]//Proceedings of 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005. Baltimore, MA, USA:ACM, 2005:368-377.
[11] Agrawal R. Fast algorithms for mining association rules[C]//Proceedings of 20th International Conference on Very Large Data Bases. Santiago, Chile:ACM, 1994:487-499.
[12] Goethals B, Zaki M J. Frequent itemset mining implementations[C]//Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations. Melbourne, FL, USA:IEEE, 2003:1-13.
[13] Pasquier N, Pasquier N, Bastide Y. Discovering frequent closed itemsets for association rules[J]. Lecture Notes in Computer Science, 2000,1540:398-416.
[14] Wang J, Han J, Pei J. CLOSET+:Searching for the best strategies for mining frequent closed itemsets[C]//Proceedings of 9th International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA:ACM, 2003:236-245.
[15] Zaki M J. Mining Non-redundant association rules[J]. Data Mining & Knowledge Discovery, 2004, 9(3):223-248.
[16] Calders T, Goethals B. Non-derivable Itemset Mining[J]. Data Mining & Knowledge Discovery, 2007, 14(1):171-206.
[17] Li J, Li H, Wong L, et al. Minimum description length principle:generators are preferable to closed patterns[C]//Proceedings of 21st AAAI Conference on Artificial Intelligence and 18th Innovative Applications of Artificial Intelligence Conference. Las Vegas, NE, USA, 2006:409-414.
[18] Li J, Liu G, Wong L. Mining statistically important equivalence classes and delta-discriminative emerging patterns[C]//Proceedings of 13th SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, CA, USA:ACM, 2007:430-439.
[19] Li J, Liu J, Toivonen H, et al. Effective pruning for the discovery of conditional functional dependencies[J]. The Computer Journal, 2013, 56(3):378-392.
[20] Tran A, Truong T, Le B. Simultaneous mining of frequent closed itemsets and their generators:Foundation and algorithm[J]. Engineering Applications of Artificial Intelligence, 2014, 36:64-80. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|