数据质量检测规则挖掘方法

刘波,耿寅融

PDF(434 KB)
模式识别与人工智能 ›› 2012, Vol. 25 ›› Issue (5) : 835-844.
研究与应用

数据质量检测规则挖掘方法

作者信息 +

Mining Method for Data Quality Detection Rules

Author information +
History +

摘要

数据质量规则是检测数据库质量的关键。为从关系数据库中自动发现数据质量规则,并以其为依据检测错误数据,研究质量规则表示形式及其评估度量,提出以数据项分组及其可信度为依据的最小质量规则计算准则、挖掘算法以及采用质量规则检测错误数据的思路。该数据质量规则形式借鉴关联规则的可信度评估机制、条件函数依赖的表达能力,统一描述函数依赖、条件函数依赖、关联规则等,具有简洁、客观、全面、检测异常数据准确等特性。与相关研究相比,降低挖掘算法的时间复杂度,提高检错率。用实验证明该方法的有效性和正确性。

Abstract

Data quality rules are key to the database quality detection. To discover data quality rules from relational databases automatically and detect the error or abnormal data based on them, the form and evaluation measures of data quality rules are studied, and criterions of computing data quality rules are presented based on data item groups and the confidence threshold. The algorithms of mining minimal data quality rules and the main idea of detecting data errors using data quality rules are also given. The new form of data quality rules makes use of confidence mechanism of association rules and the expression of conditional functional dependencies to describe functional dependencies, conditional functional dependencies and association rules in the same format. It can be concluded that this kind of data quality rules has the properties of conciseness, objectivity, completeness and accuracy of detecting the error or abnormal data. Compared with other related research work, the proposed algorithms have lower temporal complexity, and the discovered quality rules improve the detecting rate. The effectiveness and correctness of the proposed methods are proved by the experiments.

关键词

数据质量规则 / 检测 / 挖掘 / 数据项分组

Key words

Data Quality Rule / Detection / Mining / Data Item Group

引用本文

导出引用
刘波 , 耿寅融. 数据质量检测规则挖掘方法. 模式识别与人工智能. 2012, 25(5): 835-844
LIU Bo , GENG Yin-Rong. Mining Method for Data Quality Detection Rules. Pattern Recognition and Artificial Intelligence. 2012, 25(5): 835-844

参考文献

[1] Hipp J,Güntzer U,Grimmer U.Data Quality Mining-Making a Virtue of Necessity // Proc of the 6th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.Santa Barbara,USA,2001: 52-57
[2] Ciszak L.Application of Clustering and Association Methods in Data Cleaning // Proc of the International Multiconference on Computer Science and Information Technology.Wisla,Poland,2008: 97-103
[3] Liu Bo,Pan Jiuhui.Study of Abnormal Data Detecting Method Using Attribute Correlation Analysis.System Engineering and Electronics,2011,33(1): 63-68 (in Chinese)
(刘 波,潘久辉.采用属性相关分析的异常数据检测方法研究,系统工程与电子技术,2011,33(1): 63-68)
[4] Hu Yanli,Zhang Weiming,Xiao Weidong,et al.Functional Dependencies with Built-in Predicates and Its Axiomatization.Journal of National University of Defense Technology,2009,31(5): 58-63 (in Chinese)
(胡艳丽,张维明,肖卫东,等.内置谓词函数依赖及其推理规则.国防大学学报,2009,31(5): 58-63)
[5] Fan W F,Geerts F,Jia X B,et al.Conditional Functional Dependencies for Capturing Data Inconsistencies.ACM Trans on Database Systems,2008,33(2): 1-48
[6] Hu Yanli,Zhang Weiming,Luo Xuhui,et al.Dependencies Theory and Its Application for Repairing Inconsistent Data.Computer Science,2009,36(10): 11-15 (in Chinese)
(胡艳丽,张维明,罗旭辉,等.基于数据依赖的数据修复研究进展,计算机科学,2009,36(10): 11-15)
[7] Huhtala Y,Krkkinen J,Porkka P,et al.TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies.Computer Journal,1999,42(2): 100-111
[8] Wyss C,Giannella C,Robertson E.FastFDs: A Heuristic-Driven,Depth-First Algorithm for Mining Functional Dependencies from Relation Instances-Extended for Abstract // Proc of the 3rd International Conference on Data Warehousing and Knowledge Discovery.Munich,Germany,2001: 101-110
[9] Chiang F,Miller R J.Discovering Data Quality Rules // Proc of the VLDB Endowment.Auckland,New Zealand,2008,I: 1166-1177
[10] Fan W F,Geerts F,Li J Z,et al.Discovering Conditional Functional Dependencies.IEEE Trans on Knowledge and Data Engineering,2011,23(5): 683-698
[11] Medina R,Nourine L.A Unified Hierarchy for Functional Dependencies,Conditional Functional Dependencies and Association Rules // Proc of the 7th International Conference on Formal Concept Analysis.Darmstadt,Germany,2009: 98-113
[12] Beskales G,Ilyas I F,Golab L.Sampling the Repairs of Functional Dependency Violations under Hard Constraints // Proc of the VLDB Endowment.Singapore,Singapore,2010,III: 197-207

基金

国家自然科学基金项目(No.61003056)、广东省自然科学基金项目(No.S2012010008831)、广东省科技攻关项目(No.2010B010600026)资助
PDF(434 KB)

1400

Accesses

0

Citation

Detail

段落导航
相关文章

/