Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2016, Vol. 56 Issue (3): 253-261    DOI: 10.16511/j.cnki.qhdxxb.2016.21.026
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
基于开项集剪枝的常量条件函数依赖挖掘
周金陵, 刁兴春, 曹建军
解放军理工大学, 南京 210007
Mining of constant conditional functional dependencies based on pruning free itemsets
ZHOU Jinling, DIAO Xingchun, CAO Jianjun
PLA University of Science and Technology, Nanjing 210007, China
全文: PDF(1201 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 为了减小常量条件函数依赖的搜索空间, 提高挖掘效率, 针对常量条件函数依赖挖掘算法CFDMiner, 提出了一系列剪枝优化策略。理论研究发现, CFDMiner的输入——关系数据的全部开项集和闭项集对产生有效的常量条件函数依赖仍然存在很多无效、冗余的项集。从理论上证明了通过合理剪枝, 选取开项集的子集与对应的闭项集, 能够得到与原算法一致的结果。实验表明: 相比原始算法CFDMiner, 优化后的算法搜索空间更小, 实际数据集上平均挖掘效率提高4~5倍。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
周金陵
刁兴春
曹建军
关键词 条件函数依赖函数依赖开项集闭项集剪枝    
Abstract:The search space for discovering constant conditional functional dependencies (CCFDs) is reduced and the efficiency is improved by a series of pruning strategies that optimize the algorithm CFDMiner, which is a popular algorithm for mining CCFDs. Theoretical studies show many invalid and redundant free and closed itemsets for outputting valid CCFDs. Thus, pruning of free itemsets and selecting of corresponding closed itemsets can generate as consistent results as the original algorithm. Tests show that the optimized algorithm has a smaller search space and its efficiency is improved 4~5 fold on true data.
Key wordsconditional functional dependency    functional dependency    free itemset    closed itemset    pruning algorithm
收稿日期: 2015-09-28      出版日期: 2016-03-15
ZTFLH:  TP311.131  
通讯作者: 刁兴春,研究员,E-mail:diaoxch640222@163.com     E-mail: diaoxch640222@163.com
引用本文:   
周金陵, 刁兴春, 曹建军. 基于开项集剪枝的常量条件函数依赖挖掘[J]. 清华大学学报(自然科学版), 2016, 56(3): 253-261.
ZHOU Jinling, DIAO Xingchun, CAO Jianjun. Mining of constant conditional functional dependencies based on pruning free itemsets. Journal of Tsinghua University(Science and Technology), 2016, 56(3): 253-261.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.21.026  或          http://jst.tsinghuajournals.com/CN/Y2016/V56/I3/253
  表1 函数依赖与条件函数依赖示例
  图1 常量条件函数依赖挖掘算法CFDMiner
  图2 开项集(a,b,c,d)的子集树
  图3 基于剪枝策略的优化算法prCFDMiner
  表2 数据集属性及不同支持度下的开闭项集数目
  表3 不同支持度下输出的CCFD 数目
  图4 不同优化策略在搜索空间、平均存储空间和搜索时间上的性能对比
[1] Fei C, Miller R J. Discovering data quality rules[C]//Proceedings of 34th International Conference on Very Large Data Bases. Auckland, New Zealand:VLDB Endowment, ACM, 2008:1166-1177.
[2] Diallo T, Novelli N, Petit J M. Discovering (frequent) constant conditional functional dependencies[J]. International Journal of Data Mining, Modelling and Management, 2012, 4(3):205-223.
[3] Fan W, Geerts F, Jia X, et al. Conditional functional dependencies for capturing data inconsistencies[J]. ACM Transactions on Database Systems, 2008, 33(2):1-48.
[4] Fan W. Dependencies revisited for improving data quality[C]//Proceedings of 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2008. Vancouver, BC, Canada:ACM, 2008:159-170.
[5] 刘波, 耿寅荣. 数据质量检测规则挖掘方法[J]. 模式识别与人工智能, 2012, 25(5):835-844. LIU Bo, GENG Yinrong. Mining method for data quality detection rules[J]. Pattern Recognition and Artificial Intelligence, 2012, 25(5):835-844. (in Chinese)
[6] Golab L, Karloff H, Korn F, et al. On generating near-optimal tableaux for conditional functional dependencies[C]//Proceedings of 34th International Conference on Very Large Data Bases. Auckland, New Zealand:VLDB Endowment, ACM, 2008:376-390.
[7] Fan W, Geerts F, Lakshmanan L V S, et al. Discovering conditional functional dependencies[C]//Proceedings of the 25th International Conference on Data Engineering (ICDE). Shanghai, China:IEEE, 2009:1231-1234.
[8] Fan W, Geerts F, Li J, et al. Discovering conditional functional dependencies[J]. IEEE Transactions on Knowledge & Data Engineering, 2011, 23(5):683-698.
[9] Fan W, Geerts F. Foundations of data quality management[M]. San Rafael, CA, USA. Morgan & Claypool, 2012.
[10] Li H, Li J, Wong L, et al. Relative risk and odds ratio:A data mining perspective[C]//Proceedings of 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005. Baltimore, MA, USA:ACM, 2005:368-377.
[11] Agrawal R. Fast algorithms for mining association rules[C]//Proceedings of 20th International Conference on Very Large Data Bases. Santiago, Chile:ACM, 1994:487-499.
[12] Goethals B, Zaki M J. Frequent itemset mining implementations[C]//Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations. Melbourne, FL, USA:IEEE, 2003:1-13.
[13] Pasquier N, Pasquier N, Bastide Y. Discovering frequent closed itemsets for association rules[J]. Lecture Notes in Computer Science, 2000,1540:398-416.
[14] Wang J, Han J, Pei J. CLOSET+:Searching for the best strategies for mining frequent closed itemsets[C]//Proceedings of 9th International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA:ACM, 2003:236-245.
[15] Zaki M J. Mining Non-redundant association rules[J]. Data Mining & Knowledge Discovery, 2004, 9(3):223-248.
[16] Calders T, Goethals B. Non-derivable Itemset Mining[J]. Data Mining & Knowledge Discovery, 2007, 14(1):171-206.
[17] Li J, Li H, Wong L, et al. Minimum description length principle:generators are preferable to closed patterns[C]//Proceedings of 21st AAAI Conference on Artificial Intelligence and 18th Innovative Applications of Artificial Intelligence Conference. Las Vegas, NE, USA, 2006:409-414.
[18] Li J, Liu G, Wong L. Mining statistically important equivalence classes and delta-discriminative emerging patterns[C]//Proceedings of 13th SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, CA, USA:ACM, 2007:430-439.
[19] Li J, Liu J, Toivonen H, et al. Effective pruning for the discovery of conditional functional dependencies[J]. The Computer Journal, 2013, 56(3):378-392.
[20] Tran A, Truong T, Le B. Simultaneous mining of frequent closed itemsets and their generators:Foundation and algorithm[J]. Engineering Applications of Artificial Intelligence, 2014, 36:64-80.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn