High Utility-Occupancy Sequential Pattern Mining Algorithm Based On Utility-Occupancy Framework

Volume-69 Issue-4
Year of Publication : 2021
Authors : Saritha Vemulapalli, Shashi Mogalla
DOI :  10.14445/22315381/IJETT-V69I4P231


Classical sequential pattern mining (SPM) algorithms can not generate patterns that are interesting and potentially useful in all real-world applications due to equal significance for all the items and using only frequency as an interestingness measure. Some real-world applications involve items of different nature, whose significance is measured using different criteria such as utility, risk, profit, weight, time duration, etc. In addition to the utility of items constituting a pattern, the significance of a pattern is also influenced by its occupancy in its supporting sequences. To deal with the above problems, we propose a variant of SPM called high utility-occupancy sequential pattern mining (HUOSPM) to discover more interesting, potentially useful, and dominant patterns. In this paper, the authors devised two compact data structures called seqlist to represent information about each sequence of the quantitative sequence dataset and uolist to maintain candidate patterns information. The authors proposed a novel utility-occupancy framework based HUOSPM algorithm, which discovers the patterns using seqlist and uolist. The authors also proposed search space pruning strategies called pattern extension utility-occupancy, reduced sequence utility-occupancy, and extension upper bound utility-occupancy. Experimentation was carried out on real datasets with varying support threshold and utility occupancy threshold to evaluate the quality of patterns. It is observed from results that the patterns generated by our proposed HUOSPM algorithm are qualitative compared to baseline algorithm prefix span and also it can completely discover HUOSP’s.

Data mining, Pattern discovery, Pattern-Growth, Utility-Occupancy, Variant sequential patterns.