Date of Award
2017
Publication Type
Master Thesis
Degree Name
M.Sc.
Department
Computer Science
Keywords
Frequent Sequential Patterns, High Utility Mining, High Utility Sequential Mining, Tree-based mining, Tuple-based uncertainty, Uncertain data mining
Supervisor
Ezeife, Christie
Rights
info:eu-repo/semantics/openAccess
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
In general, the web access patterns are retrieved from the web access sequence databases using various sequential pattern algorithms such as GSP, WAP, and PLWAP tree. However, these algorithms do not consider sequential data with quantity (internal utility) (e.g., the amount of the time spent by the user on a web page) and quality (external utility) (e.g., the rating of a web page in a website) information. These algorithms also do not work on uncertain sequential items (e.g., purchased products) having probability (0, 1). Factoring in the utility and uncertainty of each sequence item provides more product information that can be beneficial in mining profitable patterns from company’s websites. For example, a customer can purchase a bottle of ink more frequently than a printer but the purchase of a single printer can yield more profit to the business owner than the purchase of multiple bottles of ink. Most existing traditional uncertain sequential pattern algorithms such as U-Apriori, UF-Growth, and U-PLWAP do not include the utility measures. In U-PLWAP, the web sequences are derived from web log data without including the time spent by the user and the web pages are not associated with any rating. By considering these two utilities, sometimes the items with lower existential probability can be more profitable to the website owner. In utility based traditional algorithms, the only algorithm related to both uncertain and high utility is the PHUI-UP algorithm which considers the probability and utility as different entities and the retrieved patterns are not dependent with both due to two different thresholds, and it does not mine uncertain web access database sequences. This thesis proposes the algorithm HUU-PLWAP miner for mining uncertain sequential patterns with internal and external utility information using PLWAP tree approach that cut down on several database scans of level-wise approaches. HUU-PLWAP uses uncertain internal utility values (derived from sequence uncertainty model) and the constant external utility values (predefined) to retrieve the high utility sequential patterns from uncertain web access sequence databases with the help of U-PLWAP methodology. Experiments show that HUU-PLWAP is at least 95% faster than U-PLWAP, and 75% faster than the PHUI-UP algorithm.
Recommended Citation
Vangala, Sravya, "Mining High Utility Sequential Patterns from Uncertain Web Access Sequences using the PL-WAP" (2017). Electronic Theses and Dissertations. 6022.
https://scholar.uwindsor.ca/etd/6022