Performance Comparison of Absolute High Utility Itemset Mining (AHUIM) Algorithm for Big Data

Sandeep Dalal; Vandna Dahiya

doi:https://doi.org/10.14445/22315381/IJETT-V69I1P203

Research Article | Open Access | Download PDF

Volume 69 | Issue 1 | Year 2021 | Article Id. IJETT-V69I1P203 | DOI : https://doi.org/10.14445/22315381/IJETT-V69I1P203

Performance Comparison of Absolute High Utility Itemset Mining (AHUIM) Algorithm for Big Data

Sandeep Dalal, Vandna Dahiya

Citation :

Sandeep Dalal, Vandna Dahiya, "Performance Comparison of Absolute High Utility Itemset Mining (AHUIM) Algorithm for Big Data," International Journal of Engineering Trends and Technology (IJETT), vol. 69, no. 1, pp. 17-23, 2021. Crossref, https://doi.org/10.14445/22315381/IJETT-V69I1P203

Abstract

High utility itemset mining (HUI) targets the mining of high utility itemsets from a database. The utility here is defined as the amalgamation of the magnitude of the item and its importance. Although various studies have been done on HUI, they are mainly dedicated to centralized datasets and are not mountable for big data. A novel technique called the Absolute High Utility Itemset Mining (AHUIM) algorithm for parallel mining of HUIs has been recommended to tackle the issue of big data environment. The algorithm uses the Spark-in-memory computing architecture where the whole mining task is divided into smaller independent sub-tasks. Several pruning strategies have been used to implement the algorithm to efficiently mine the dataset, diminishing the need for traversing unpromising search space. The proposed algorithm inherits Spark’s numerous properties such as fault tolerance, scalability, low communication cost, etc. In this research work, the functioning of AHUIM is being evaluated by comparing it with the most recent and fast algorithms for mining HUIs from big data. Extensive experiments show that the novel algorithm is better than other state-of-the-art algorithms for various factors such as time complexity, storage, scalability, etc.

Keywords

big data mining, distributed computing, MapReduce, Spark platform, utility mining

References

[1] Lin, J. C. W., Li, T., Fournier-Viger, P., Hong, T.P., Zhan, J., Voznak, M., “An Efficient Algorithm to Mine High Average-Utility Itemsets”, Adv. Eng. Inf. Vol. 30 (2), pp. 233-243, 2016.
[2] Chen, Y., An, A., “Approximate Parallel High Utility Itemset Mining”, Big Data Res. 6, pp 26-42, 2016.
[3] Zihayat, M., Hut, Z. Z., an, A., & Hut, Y., “Distributed and Parallel High Utility Sequential Pattern Mining”, In 2016 IEEE International Conference on Big Data (Big Data) pp. 853-862. IEEE, 2016.
[4] Tamrakar, A., “High Utility Itemsets Identification in Big Data”, Masters Thesis, University of Nevada, Las Vegas, 2017.
[5] Jimmy Ming-Tai Wu, Jerry Chun-Wei Lin, and Ashish Tamrakar, ‘High-Utility Itemset Mining with Effective Pruning Strategies’, ACM Trans. Knowl. Discov. Data 13, 6, Article 58, 22 pages, 2019.
[6] Sethi, K. K., Ramesh, D. Edla, D.R., “P-FHM+: Parallel High Utility Itemset Mining Algorithm for Big Data Processing”, Procedia Compuer Science 132, 918-927, 2018.
[7] Sethi, K. K., Ramesh, D., Sreenu, M., “Parallel High Average-Utility Itemset Mining Using Better Search Space Division Approach”, Springer, Cham, pp 233-243, 2019.
[8] Nguyen, T. D., Nguyen, L.T., Vo, B., “A Parallel Algorithm for Mining High Utility Itemsets,” Springer, Cham, pp. 286-295, 2018.
[9] Dalal Sandeep, Dahiya Vandna, “Review of High Utility Itemset Mining Algorithms for Big Data,” In: Journal of Advanced Research in Dynamical and Control Systems- JARDCS, 10(4), pp: 274-283, 2018.
[10] Vandna Dahiya, Sandeep Dalal, “Big data Mining: Current Status and Future Prospects”, International Journal of Advanced science and Technology, Volume 29, No 3, pp. 4659- 4670, 2020.
[11] C. F. Ahmed, S. K. Tanbeer, and B. Jeong, “A novel approach for mining high-utility sequential patterns in sequence databases,” In ETRI Journal, vol. 32, pp. 676–686, 2010.
[12] M. Zihayat and A. A. Mining, “Top-k high utility patterns over data streams,” In Information Sciences, Available Online, 2014.
[13] Subramanian, K., Kandhasamy, P., Subramanian, S., “A Novel Approach to Extract High Utility Itemsets from Distributed Databases”, Computing and Informatics vol 31 (6), pp.1597-1615, 2013.
[14] Zida, S., Fournier-Viger, P., Wu, C.-W., Lin, J.C.-W., Tseng, V.S., “Efficient Mining of High Utility Sequential Rules”, In: Proc. 11th Intern. Conf. on Machine Learning and Data Mining, pp. 157– 171. Springer, 2016.
[15] Sandeep Dalal, Vandna Dahiya, “A Novel Technique - Absolute High Utility Itemset Mining (AHUIM) Algorithm for Big Data”, International Journal of Advanced Trends in Computer Science and Engineering, IJATCSE, Volume 9, Issue 5, pp 7451-7460, 2020.
[16] Yao H, Hamilton HJ, ButzCJ, “A Foundational Approach to Mining itemset Utilities from Databases”, In: Proceedings of the 3rd SIAM International conference on data mining, FL, USA, April 2004, pp 482-486.
[17] Borthakur, D., (2007), The Hadoop Distribued File System: Architecture and Design. Hadoop Project Website 11, 21.
[18] Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., MaCauley, M., Stoica, I., “Resilient Distributed Datasets: A Fault-Tolerant abstraction for In-memory Cluster Computing”, Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, 2010.
[19] Dahiya Vandna, Sandeep Dalal, “Parallel Approaches of Utility Mining for Big Data”, Webology. 17(2), pp – 31-43, 2020.
[20] Liu Y., Liao W., Choudhary A., “A Two Phase Algorithm for Fast Discovery of High Utility Itemsets”, Advances in Knowledge and Data Mining, Lecture Notes in Computer Science, Vol 3518, Springer, pp 689-695, 2005.
[21] Sandeep Dalal, Vandna Dahiya, “Big Data Preprocessing: Needs and Methods”, International Journal of Engineering Trends and Technology, 68(10), pp- 100-104, 2020.
[22] https://www.philippe-fournier-viger.com/spmf/- An open source Data Mining Library
[23] The Hadoop Project website, [Online]. Available: https://hadoop.apache.org/
[24] The Spark Project website, [Online]. Available: https://spark.apache.org/
[25] The UCI Repository, [Online]. Available: archive.ics.uci.edu/ml/dataset