Date of Award

2017

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Frequent Sequential Patterns, High Utility Mining, High Utility Sequential Mining, Tree-based mining, Tuple-based uncertainty, Uncertain data mining

Supervisor

Ezeife, Christie

Rights

info:eu-repo/semantics/openAccess

Abstract

In general, the web access patterns are retrieved from the web access sequence databases using various sequential pattern algorithms such as GSP, WAP, and PLWAP tree. However, these algorithms do not consider sequential data with quantity (internal utility) (e.g., the amount of the time spent by the user on a web page) and quality (external utility) (e.g., the rating of a web page in a website) information. These algorithms also do not work on uncertain sequential items (e.g., purchased products) having probability (0, 1). Factoring in the utility and uncertainty of each sequence item provides more product information that can be beneficial in mining profitable patterns from company’s websites. For example, a customer can purchase a bottle of ink more frequently than a printer but the purchase of a single printer can yield more profit to the business owner than the purchase of multiple bottles of ink. Most existing traditional uncertain sequential pattern algorithms such as U-Apriori, UF-Growth, and U-PLWAP do not include the utility measures. In U-PLWAP, the web sequences are derived from web log data without including the time spent by the user and the web pages are not associated with any rating. By considering these two utilities, sometimes the items with lower existential probability can be more profitable to the website owner. In utility based traditional algorithms, the only algorithm related to both uncertain and high utility is the PHUI-UP algorithm which considers the probability and utility as different entities and the retrieved patterns are not dependent with both due to two different thresholds, and it does not mine uncertain web access database sequences. This thesis proposes the algorithm HUU-PLWAP miner for mining uncertain sequential patterns with internal and external utility information using PLWAP tree approach that cut down on several database scans of level-wise approaches. HUU-PLWAP uses uncertain internal utility values (derived from sequence uncertainty model) and the constant external utility values (predefined) to retrieve the high utility sequential patterns from uncertain web access sequence databases with the help of U-PLWAP methodology. Experiments show that HUU-PLWAP is at least 95% faster than U-PLWAP, and 75% faster than the PHUI-UP algorithm.

Share

COinS