Date of Award


Publication Type

Master Thesis

Degree Name



Computer Science

First Advisor

Ezeife, Christie


CF, clickstream history, collaborative filtering, data mining, E-commerce recommendation system, weighted frequent item




E-commerce collaborative filtering recommendation systems, the main input data of user-item rating matrix is a binary purchase data showing only what items a user has purchased recently. This matrix is usually sparse and does not provide a lot of information about customer purchases or product clickstream behavior (eg., clicks, basket placement, and purchase) history, which possibly can improve product recommendations accuracy. Existing recommendation systems in E-commerce with clickstream data include those referred in this thesis as Kim05Rec, Kim11Rec, and Chen13Rec. Kim05Rec forms a decision tree on click behavior attributes such as search type and visit times, discovers the possibility of a user putting products into the basket and uses the information to enrich the user-item rating matrix. If a user clicked a product, Kim11Rec then finds the associated products for it in three stages such as click, basket and purchase, uses the lift value from these stages and calculates a score, it then uses the score to make recommendations. Chen13Rec measures the similarity of users on their category click patterns such as click sequences, click times and visit duration; it then can use the similarity to enhance the collaborative filtering algorithm. However, the similarity between click sequences in sessions can apply to the purchases to some extent, especially for sessions without purchases, this will be able to predict purchases for those session users. But the existing systems have not integrated it, or the historical purchases which shows more than whether or not a user has purchased a product before. In this thesis, we propose HPCRec (Historical Purchase with Clickstream based Recommendation System) to enrich the ratings matrix from both quantity and quality aspects. HPCRec firstly forms a normalized rating-matrix with higher quality ratings from historical purchases, then mines consequential bond between clicks and purchases with weighted frequencies where the weights are similarities between sessions, but rating quantity is better by integrating this information. The experimental results show that our approach HPCRec is more accurate than these existing methods, HPCRec is also capable of handling infrequent cases whereas the existing methods can not.