High Utility-Occupancy Sequential Pattern Mining Algorithm Based On Utility-Occupancy Framework
Citation
MLA Style: Saritha Vemulapalli, Shashi Mogalla "High Utility-Occupancy Sequential Pattern Mining Algorithm Based On Utility-Occupancy Framework" International Journal of Engineering Trends and Technology 69.4(2021):228-235.
APA Style:Saritha Vemulapalli, Shashi Mogalla. High Utility-Occupancy Sequential Pattern Mining Algorithm Based On Utility-Occupancy Framework International Journal of Engineering Trends and Technology, 69(4),228-235.
Abstract
Classical sequential pattern mining (SPM) algorithms can not generate patterns that are interesting and potentially useful in all real-world applications due to equal significance for all the items and using only frequency as an interestingness measure. Some real-world applications involve items of different nature, whose significance is measured using different criteria such as utility, risk, profit, weight, time duration, etc. In addition to the utility of items constituting a pattern, the significance of a pattern is also influenced by its occupancy in its supporting sequences. To deal with the above problems, we propose a variant of SPM called high utility-occupancy sequential pattern mining (HUOSPM) to discover more interesting, potentially useful, and dominant patterns. In this paper, the authors devised two compact data structures called seqlist to represent information about each sequence of the quantitative sequence dataset and uolist to maintain candidate patterns information. The authors proposed a novel utility-occupancy framework based HUOSPM algorithm, which discovers the patterns using seqlist and uolist. The authors also proposed search space pruning strategies called pattern extension utility-occupancy, reduced sequence utility-occupancy, and extension upper bound utility-occupancy. Experimentation was carried out on real datasets with varying support threshold and utility occupancy threshold to evaluate the quality of patterns. It is observed from results that the patterns generated by our proposed HUOSPM algorithm are qualitative compared to baseline algorithm prefix span and also it can completely discover HUOSP’s.
Reference
[1] R. Agrawal and R. Srikant, Mining sequential patterns, In Proceedings of 11th International Conference on Data Engineering, IEEE. (1995) 3–14.
[2] Ahmed, C.F., Tanbeer, S.K., Jeong, B.S., A novel approach for mining high-utility sequential patterns in sequence databases, ETRI Journal. 32(5)(2010) 676–686.
[3] J. Yin, Z. Zheng, and L. Cao, USpan: an efficient algorithm for mining high utility sequential patterns, ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. p(2012) 660-668.
[4] G. C. Lan, T. P. Hong, V. S. Tseng, and S. L. Wang, Applying the maximum utility measure in high utility sequential pattern mining, Expert Systems with Applications. 41(11)(2014) 5071–5081.
[5] O. K. Alkan and P. Karagoz, CRoM and HuspExt: improving efficiency of high utility sequential pattern extraction, IEEE Transactions on Knowledge and Data Engineering. 27(10)(2015) 2645–2657.
[6] J. Z. Wang, J. L. Huang, and Y. C. Chen, On efficiently mining high utility sequential patterns, Knowledge and Information Systems. 49(2) (2016) 597–627.
[7] W. Gan, J. C. W. Lin, J. Zhang, P. Fournier-Viger, H. C. Chao, and P. S. Yu, Fast utility mining on sequence data, IEEE TRANSACTIONS ON CYBERNETICS. 52(2)(2020) 2168-2267.
[8] Bac Le, Ut Huynh, Duy-Tai Dinh, A pure array structure and parallel strategy for high-utility sequential pattern mining, Expert Systems With Applications, Elsevier, 104(2018) 107-120.
[9] LEI ZHANG, PING LUO, LINPENG TANG, ENHONG CHEN, QI LIU, MIN WANG, and HUI XIONG, Occupancy-Based Frequent Pattern Mining, ACM Transactions on Knowledge Discovery from Data. 10(2) (2015) 14:1- 14:33.
[10] B. Shen, Z. Wen, Y. Zhao, D. Zhou, and W. Zheng, OCEAN: Fast discovery of high utility occupancy itemsets, in Proc. Pac.–Asia Conf. Knowl. Disc. Data Mining. (2016) 354–365.
[11] Wensheng Gan, Jerry Chun-Wei Lin , Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu, HUOPM: High-Utility Occupancy Pattern Mining, IEEE TRANSACTIONS ON CYBERNETICS. 50(3) (2020) 1195-1208.
[12] P. Fournier-Viger, C. W. Lin, A. Gomariz, A. Soltani, Z. Deng, H. T. Lam, The SPMF open-source data mining library version 2, The European Conference on Principles of Data Mining and Knowledge Discovery. (2016) 36-40, URL: http://www.philippe-fournier-viger.com/spmf/.
Keywords
Data mining, Pattern discovery, Pattern-Growth, Utility-Occupancy, Variant sequential patterns.