Parallel Eclat with Large Data Base Parallel Algorithm and Improve its Effectiveness
Citation
Ms. Shruti Ingle, Mr. Abhay Kothari"Parallel Eclat with Large Data Base Parallel Algorithm and Improve its Effectiveness", International Journal of Engineering Trends and Technology (IJETT), V60(3),180-183 June 2018. ISSN:2231-5381. www.ijettjournal.org. published by seventh sense research group
Abstract
To better utilize the aggregate computing resources of parallel machines, a localized algorithm based on parallelization of Eclat was proposed and exhibited excellent scalability. It makes use of a vertical data layout by transforming the horizontal database transactions into vertical lists of item sets. By name, the list of an item set is a sorted list of ID’s for all transactions that contain the item set. Frequent k -item sets are organized into disjoint equivalence classes by common (k 1)-prefixes, so that candidate (k+1)-item sets can be generated by joining pairs of frequent k-item sets from the same classes. The support of a candidate item set can then be computed simply by intersecting the -lists of the two component subsets. Task parallelism is employed by dividing the mining tasks for different classes of item sets among the available processes. The equivalence classes of all frequent 2-itemsets are assigned to processes and the associated lists are distributed accordingly. Each process then mines frequent item sets generated from its assigned equivalence classes independently, by scanning and intersecting the local lists. The steps for the parallel Eclat algorithm are presented below for Distributed-memory multiprocessors divide the database evenly into horizontal partitions among all processes.
Reference
[1] R. Agrawal, T. Imielinski, and A.N. Swami, "Mining association rules between sets of items in large databases," in ACM SIGMOD International Conference on Management of Data, Washington, 1993.
[2] R. Agrawal, and R. Srikant, "Fast algorithms for mining association rules," in 20th International Conference on Very Large Data Bases, Washington, 1994.
[3] J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate generation," in ACM SIGMOD International Conference on Management of Data, Texas, 2000.
[4] M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, "New algorithms for fast discovery of association rules," in Third International Conference on Knowledge Discovery and Data Mining, 1997.
[5] a. K. G. M.J. Zaki, "Fast vertical mining using diffsets," in The nineth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.
[6] Paul W. Purdom, Dirk Van Gucht, and Dennis P. Groth, "Average case performance of the apriori algorithm," vol. 33, p. 1223–1260, 2004.
[7] S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, "Adaptive and resource-aware mining of frequent sets," in Proceedings of the 2002 IEEE International Conference on Data Mining, 2002.
[8] P. Shenoy, J.R. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, and D. Shah, "Turbo-charging vertical mining of large databases," in ACM SIGMOD International Conference on Management of Data, 2000.
[9] B. Goethals, "Survey on frequent pattern mining," 2002.
[10] Yan Zhang, Fan Zhang, Jason Bakos, "Frequent Itemset Mining on LargeScale Shared Memory Machines," 2011.
[11] Mohammed JaveedZaki, SrinivasanParthasarathy, and Wei Li, "A Localized Algorithm for Parallel AssociationMining," in 9th Annual ACM Symposium on Parallel Algorithms and Architectures, 1997.
[12] "Frequent Repository," Itemset [Online]. Mining Available: Dataset http://fimi.ua.ac.be/data/.
[13] Zaki, M. J., Parthsarathy, S., Ogihara, M., and Li, W. New Algorithms for Fast Discovery of Association Rules. KDD, 283-286. 1997. Agarwal, R., Aggarwal, C., and Prasad, V.V.V. 2001.
[14] Goulbourne, G., Coenen, F., and Leng, P. H. Computing association rule using partial totals. In Proceedings of the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases, 54-66. 2001.
[15] Pei, J., Han, J., Nishio, S., Tang, S., and Yang, D. H-Mine: Hyper-Structure Mining of Frequent Patterns in Large Databases. Proc.2001 Int.Conf.on Data Mining. 2001.
Keywords
Eclat , Data Base Parallel Algorithm