Date of Award


Publication Type

Doctoral Thesis

Degree Name



Computer Science

First Advisor

Ezeife, Christie (School of Computer Science)


Computer Science.




Web Recommendation Systems (WRS's) are used to recommend items and future page views to world wide web users. Web usage mining lays the platform for WRS's, as results of mining user browsing patterns are used for recommendation and prediction. Existing WRS's are still limited by several problems, some of which are the problem of recommending items to a new user whose browsing history is not available (Cold Start), sparse data structures (Sparsity), and no diversity in the set of recommended items (Content Overspecialization). Existing WRS's also fail to make full use of the semantic information about items and the relations (e.g., is-a, has-a, part-of) among them. A domain ontology, advocated by the Semantic Web, provides a formal representation of domain knowledge with relations, concepts and axioms.This thesis proposes SemAware system, which integrates domain ontology into web usage mining and web recommendation, and increases the effectiveness and efficiency of the system by solving problems of cold start, sparsity, content overspecialization and complexity-accuracy tradeoffs. SemAware technique includes enriching the web log with semantic information through a proposed semantic distance measure based on Jaccard coefficient. A matrix of semantic distances is then used in Semantics-aware Sequential Pattern Mining (SPM) of the web log, and is also integrated with the transition probability matrix of Markov models built from the web log. In the recommendation phase, the proposed SPM and Markov models are used to add interpretability. The proposed recommendation engine uses vector-space model to build anitem-concept correlation matrix in combination with user-provided tags to generate top-n recommendation.Experimental studies show that SemAware outperforms popular recommendation algorithms, and that its proposed components are effective and efficient for solving the contradicting predictions problem, the scalability and sparsity of SPM and top-n recommendations, and content overspecialization problems.