An Empirical Data Cleaning Technique for CFDs
International Journal of Engineering Trends and Technology (IJETT)  

© 2013 by IJETT Journal  
Volume4 Issue9 

Year of Publication : 2013  
Authors : Satyanarayana Mummana , Ravi kiran Rompella 
Citation
Satyanarayana Mummana , Ravi kiran Rompella . "An Empirical Data Cleaning Technique for C FDs". International Journal of Engineering Trends and Technology (IJETT). V4(9):37303735 Sep 2013. ISSN:22315381. www.ijettjournal.org. published by seventh sense research group.
Abstract
Data cleaning is a basic data preprocessing technique for before forwarding the data to data mining approach ,but it leads to an intresting research area in the field of data mining. Data cleaning is the process of finding and deleting noisy data/records from the database. The simplest technique used for data cleaning is based on Functional Dependencies. As FDs works on entire instance of a table we introduced a new technique called Conditional Functional Dependencies. CFDs are like if then rules. The de pendence between the columns of a table are represented as conditions using functions.. For example if we consider a employee table which maintains the employee name,id,city,pincode and etc. In this table the employees who are belongs to the same city, are all may have the same pincode, So that we can generate a FD that city  >pincode. CFD means using specific condition for the FD. ex:city=vizag  >pincode=531005. The main agend of our project is to find the CFD violated rows in a table using the created CFDs. These CFDs violated rows are deleted to correct data
References
[1] M. Arenas and L. Libkin, “A Normal Form for xml Documents,” ACM Trans. Database Systems, vol. 29, pp. 195  232, 2004.
[2] P. Atzgeni and V.D. Antonellis, Relational Database Theory. The Benjamin/Cummings Publishing Company, Inc., 1993.
[3] J. Bauckmann, U. Leser, and F. Naumann, “Efficiently Computing Inclusion Dependencies for Schema Discovery,” Proc. Second Int’l Workshop Database Interoperability, 2006.
[4] C. Beeri, M. Dowd, R. Fagin, and R. Statman, “On the Structure of Armstrong Relati ons for Functional Dependencies,” J. Assoc. for Computing Machinery, vol. 31, no. 1, pp. 30  46, 1984.
[5] S. Bell, “Discovery and Maintenance of Functional Dependencies by Independencies,” Proc. Workshop. Knowledge Discovery in Databases (KDD ’95), pp. 27  32, 1995.
[6] P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis, “Conditional Functional Dependencies for Data Cleaning,” Proc. IEEE 23rd Int’l Conf. Data Eng. (ICDE), pp. 746  755, 2007.
[7] T. Calders, R.T. Ng, and J. Wijsen, “Searching for D ependencies at Multiple Abstraction Levels,” ACM Trans. Database Systems, vol. 27, no. 3, pp. 229  260, 2002.
[8] F. Chiang and R.J. Miller, “Discovering Data Quality Rules,”VLDB Endowment, vol. 1, no. 1, pp. 1166  1177, 2008.
[9] G. Cormode, L. Golab, K. Fl ip, A. McGregor, D. Srivastava, and X. Zhang, “Estimating the Confidence of Conditional Functional Dependencies,” Proc. SIGKDD Int’l Conf., pp. 469  482, 2009.
[10] S.S. Cosmadakis, P.C. Kanellakis, and N. Spyratos, “Partition Semantics for Relations,” Proc . Fourth ACM SIGACT  SIGMOD Symp. Principles of Database Systems (PODS), pp. 261  275, 1985.
[11] W. Fan, F. Geerts, L.V.S. Lakshmanan, and M. Xiong, “Discovering Conditional Functional Dependencies,” Proc. IEEE 25th Int’l Conf. Data Eng. (ICDE), pp. 1231  12 34, 2009.
[12] P.A. Flach and I. Savnik, “Database Dependency Discovery: A Machine Learning Approach,” Artificial Intelligence Comm., vol. 12, no. 3, pp. 139  160, 1999.
[13] C. Giannella and E. Robertson, “On Approximation Measures for Functional Dependenc ies,” Information Systems, vol. 29, no. 6, pp. 483  507, 2004.
[14] L. Golab, H. Karloff, F. Korn, D. Srivastava, and B. Yu, “On Generating near  Optimal Tableaux for Conditional Functional Dependencies,” Proc. Very Large Databases (VLDB) Conf., pp. 376  390, 2008.
[15] G. Gottlob and L. Libkin, “Investigations on Armstrong Relations, Dependency Inference, and Excluded Functional Dependencies,” Acta Cybernetica, vol. 9, no. 4, pp. 395  402, 1990.
[16] Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen, “Tane : An Efficient Algorithm for Discovering Functional and Approximate Dependencies,” Computer J., vol. 42, no. 2, pp. 100  111, 1999.
[17] I.F. Ilyas, V. Mark, P. Haas, P. Brown, and A. Aboulnaga, “Cords: Automatic Discovery of Correlations Soft Functional D ependencies,” Proc. SIGMOD Int’l Conf. Management of Data, 2004.
[18] M. Kantola, H. Mannila, K.  J. Ra ?iha ?, and H. Siirtola, “Discovering Functional and Inclusion Dependencies in Relational Databases,” Int’l J. Intelligent Systems, vol. 7, no. 7, pp. 591  607, 1992.
[19] R.S. King and J. Oil, “Discovery of Functional and Approximate Functional Dependencies in Relational Databases,” J. Applied Math. and Decision Sciences, vol. 7, no. 1, pp. 49  59, 2003.
[20] J. Kivinen and H. Mannila, “Approximate Dependency Inference From Relations,” Proc. Fourth Int’l Conf. Database Theory (ICDT ’92), pp. 86  98, 1992.
[21] A. Koeller and E.A. Rundensteiner, “Heuristic Strategies for Inclusion Dependency Discovery,” On the Move to Meaningful Internet Systems 2004: Proc. Int’ l Conf. CoopIS, DOA, and ODBASE, pp. 891  908, 2004.
[22] S. Lopes, J.  M. Petit, and L. Lakhal, “Efficient Discovery of Functional Dependencies and Armstrong Relations,” Proc. Seventh Int’l Conf. Extending Database Technology (EDBT): Advances in Database Te chnology, vol. 1777, pp. 350  364, 2000.
[23] S. Lopes, J.  M. Petit, and L. Lakhal, “Functional and Approximate Dependency Mining: Database and Fca Points of View,” J. Experimental and Theoretical Artificial Intelligence, vol. 14, no. 2, pp. 93  114, 2002.
[ 24] H. Mannila and K.  J. Ra ?iha ?, “Dependency Inference,” Proc. 13 th Int’l Conf. Very Large Data Bases (VLDB), pp. 155  158, 1987.
[25] H. Mannila and K.  J. Ra ?iha ?, “On the Complexity of Inferring Functional Dependencies,” Discrete Applied Math., vol. 40, pp. 237  243, 1992.
[26] F. De Marchi, F. Flouvat, and J.  M. Petit, “Adaptive Strategies for Mining the Positive Border of Interesting Patterns: Application to Inclusion Dependencies in Databases,” Proc. Workshop Constraint  Based Mining and Inductive Data bases, pp. 81  101, 2006.
[27] F. De marchi, S. Lopes, and J.  M. Petit, “Efficient Algorithms for Mining Inclusion Dependencies,” Proc. Eighth Int’l Conf. Extending Database Technology (EDBT), pp. 199  214, 2002.
[28] F. De Marchi, S. Lopes, and J.  M. Petit, “Unary and N  Ary Inclusion Dependency Discovery in Relational Databases,” J. Intelligent Information Systems, vol. 32, no. 1, pp. 53  73, 2009.
[29] F. De Marchi and J.  M. Petit, “Approximating a Set of Approximate Inclusion Dependencies,” Advances in Soft Computing — Intelligent Information Processing and Web Mining, vol. 31, pp. 633  640, 2005 .