Opportunistic mining of top-n high utility patterns
ORCID Identifiers
Document Type
Article
Source of Publication
Information Sciences
Publication Date
5-1-2018
Abstract
© 2018 Elsevier Inc. Mining high utility patterns is an important data mining problem that is formulated as finding patterns whose utilities are no less than a threshold. As the mining results are very sensitive to such a threshold, it is difficult for users to specify an appropriate one. An alternative formulation of the problem is to find the top-n high utility patterns. However, the second formulation is more challenging because the corresponding threshold is unknown in advance and the solution search space becomes even larger. When there are very long patterns prior algorithms simply cannot work to mine top-n high utility patterns even for very small n. This paper proposes a novel algorithm for mining top-n high utility patterns that are long. The proposed algorithm adopts an opportunistic pattern growth approach and proposes five opportunistic strategies for scalably maintaining shortlisted patterns, for efficiently computing utilities, and for estimating tight upper bounds to prune search space. Extensive experiments show that the proposed algorithm is 1 to 3 orders of magnitude more efficient than the state-of-the-art top-n high utility pattern mining algorithms, and it is even up to 2 orders of magnitude faster than high utility pattern mining algorithms that are tuned with an optimal threshold.
DOI Link
ISSN
Publisher
Elsevier Inc.
Volume
441
First Page
171
Last Page
186
Disciplines
Computer Sciences | Electrical and Computer Engineering
Keywords
Frequent patterns, High utility patterns, Pattern mining, Top-n interesting patterns, Utility mining
Scopus ID
Recommended Citation
Liu, Junqiang; Zhang, Xingxing; Fung, Benjamin C.M.; Li, Jiuyong; and Iqbal, Farkhund, "Opportunistic mining of top-n high utility patterns" (2018). All Works. 2602.
https://zuscholars.zu.ac.ae/works/2602
Indexed in Scopus
yes
Open Access
yes
Open Access Type
Green: A manuscript of this publication is openly available in a repository