An Empirical Data Cleaning Technique for CFDs

  ijett-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2013 by IJETT Journal
Volume-4 Issue-9                      
Year of Publication : 2013
Authors : Satyanarayana Mummana , Ravi kiran Rompella

Citation 

Satyanarayana Mummana , Ravi kiran Rompella . "An Empirical Data Cleaning Technique for C FDs". International Journal of Engineering Trends and Technology (IJETT). V4(9):3730-3735 Sep 2013. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group.

Abstract

Data cleaning is a basic data preprocessing technique for before forwarding the data to data mining approach ,but it leads to an intresting research area in the field of data mining. Data cleaning is the process of finding and deleting noisy data/records from the database. The simplest technique used for data cleaning is based on Functional Dependencies. As FDs works on entire instance of a table we introduced a new technique called Conditional Functional Dependencies. CFDs are like if then rules. The de pendence between the columns of a table are represented as conditions using functions.. For example if we consider a employee table which maintains the employee name,id,city,pincode and etc. In this table the employees who are belongs to the same city, are all may have the same pincode, So that we can generate a FD that city --- >pincode. CFD means using specific condition for the FD. ex:city=vizag ---- >pincode=531005. The main agend of our project is to find the CFD violated rows in a table using the created CFDs. These CFDs violated rows are deleted to correct data

References

[1] M. Arenas and L. Libkin, “A Normal Form for xml Documents,” ACM Trans. Database Systems, vol. 29, pp. 195 - 232, 2004.
[2] P. Atzgeni and V.D. Antonellis, Relational Database Theory. The Benjamin/Cummings Publishing Company, Inc., 1993.
[3] J. Bauckmann, U. Leser, and F. Naumann, “Efficiently Computing Inclusion Dependencies for Schema Discovery,” Proc. Second Int’l Workshop Database Interoperability, 2006.
[4] C. Beeri, M. Dowd, R. Fagin, and R. Statman, “On the Structure of Armstrong Relati ons for Functional Dependencies,” J. Assoc. for Computing Machinery, vol. 31, no. 1, pp. 30 - 46, 1984.
[5] S. Bell, “Discovery and Maintenance of Functional Dependencies by Independencies,” Proc. Workshop. Knowledge Discovery in Databases (KDD ’95), pp. 27 - 32, 1995.
[6] P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis, “Conditional Functional Dependencies for Data Cleaning,” Proc. IEEE 23rd Int’l Conf. Data Eng. (ICDE), pp. 746 - 755, 2007.
[7] T. Calders, R.T. Ng, and J. Wijsen, “Searching for D ependencies at Multiple Abstraction Levels,” ACM Trans. Database Systems, vol. 27, no. 3, pp. 229 - 260, 2002.
[8] F. Chiang and R.J. Miller, “Discovering Data Quality Rules,”VLDB Endowment, vol. 1, no. 1, pp. 1166 - 1177, 2008.
[9] G. Cormode, L. Golab, K. Fl ip, A. McGregor, D. Srivastava, and X. Zhang, “Estimating the Confidence of Conditional Functional Dependencies,” Proc. SIGKDD Int’l Conf., pp. 469 - 482, 2009.
[10] S.S. Cosmadakis, P.C. Kanellakis, and N. Spyratos, “Partition Semantics for Relations,” Proc . Fourth ACM SIGACT - SIGMOD Symp. Principles of Database Systems (PODS), pp. 261 - 275, 1985.
[11] W. Fan, F. Geerts, L.V.S. Lakshmanan, and M. Xiong, “Discovering Conditional Functional Dependencies,” Proc. IEEE 25th Int’l Conf. Data Eng. (ICDE), pp. 1231 - 12 34, 2009.
[12] P.A. Flach and I. Savnik, “Database Dependency Discovery: A Machine Learning Approach,” Artificial Intelligence Comm., vol. 12, no. 3, pp. 139 - 160, 1999.
[13] C. Giannella and E. Robertson, “On Approximation Measures for Functional Dependenc ies,” Information Systems, vol. 29, no. 6, pp. 483 - 507, 2004.
[14] L. Golab, H. Karloff, F. Korn, D. Srivastava, and B. Yu, “On Generating near - Optimal Tableaux for Conditional Functional Dependencies,” Proc. Very Large Databases (VLDB) Conf., pp. 376 - 390, 2008.
[15] G. Gottlob and L. Libkin, “Investigations on Armstrong Relations, Dependency Inference, and Excluded Functional Dependencies,” Acta Cybernetica, vol. 9, no. 4, pp. 395 - 402, 1990.
[16] Y. Huhtala, J. Karkkainen, P. Porkka, and H. Toivonen, “Tane : An Efficient Algorithm for Discovering Functional and Approximate Dependencies,” Computer J., vol. 42, no. 2, pp. 100 - 111, 1999.
[17] I.F. Ilyas, V. Mark, P. Haas, P. Brown, and A. Aboulnaga, “Cords: Automatic Discovery of Correlations Soft Functional D ependencies,” Proc. SIGMOD Int’l Conf. Management of Data, 2004.
[18] M. Kantola, H. Mannila, K. - J. Ra ?iha ?, and H. Siirtola, “Discovering Functional and Inclusion Dependencies in Relational Databases,” Int’l J. Intelligent Systems, vol. 7, no. 7, pp. 591 - 607, 1992.
[19] R.S. King and J. Oil, “Discovery of Functional and Approximate Functional Dependencies in Relational Databases,” J. Applied Math. and Decision Sciences, vol. 7, no. 1, pp. 49 - 59, 2003.
[20] J. Kivinen and H. Mannila, “Approximate Dependency Inference From Relations,” Proc. Fourth Int’l Conf. Database Theory (ICDT ’92), pp. 86 - 98, 1992.
[21] A. Koeller and E.A. Rundensteiner, “Heuristic Strategies for Inclusion Dependency Discovery,” On the Move to Meaningful Internet Systems 2004: Proc. Int’ l Conf. CoopIS, DOA, and ODBASE, pp. 891 - 908, 2004.
[22] S. Lopes, J. - M. Petit, and L. Lakhal, “Efficient Discovery of Functional Dependencies and Armstrong Relations,” Proc. Seventh Int’l Conf. Extending Database Technology (EDBT): Advances in Database Te chnology, vol. 1777, pp. 350 - 364, 2000.
[23] S. Lopes, J. - M. Petit, and L. Lakhal, “Functional and Approximate Dependency Mining: Database and Fca Points of View,” J. Experimental and Theoretical Artificial Intelligence, vol. 14, no. 2, pp. 93 - 114, 2002.
[ 24] H. Mannila and K. - J. Ra ?iha ?, “Dependency Inference,” Proc. 13 th Int’l Conf. Very Large Data Bases (VLDB), pp. 155 - 158, 1987.
[25] H. Mannila and K. - J. Ra ?iha ?, “On the Complexity of Inferring Functional Dependencies,” Discrete Applied Math., vol. 40, pp. 237 - 243, 1992.
[26] F. De Marchi, F. Flouvat, and J. - M. Petit, “Adaptive Strategies for Mining the Positive Border of Interesting Patterns: Application to Inclusion Dependencies in Databases,” Proc. Workshop Constraint - Based Mining and Inductive Data bases, pp. 81 - 101, 2006.
[27] F. De marchi, S. Lopes, and J. - M. Petit, “Efficient Algorithms for Mining Inclusion Dependencies,” Proc. Eighth Int’l Conf. Extending Database Technology (EDBT), pp. 199 - 214, 2002.
[28] F. De Marchi, S. Lopes, and J. - M. Petit, “Unary and N - Ary Inclusion Dependency Discovery in Relational Databases,” J. Intelligent Information Systems, vol. 32, no. 1, pp. 53 - 73, 2009.
[29] F. De Marchi and J. - M. Petit, “Approximating a Set of Approximate Inclusion Dependencies,” Advances in Soft Computing — Intelligent Information Processing and Web Mining, vol. 31, pp. 633 - 640, 2005 .