Date of Award
2024
Publication Type
Thesis
Degree Name
M.Sc.
Department
Computer Science
Keywords
Content-Based filtering; Data mining; E-commerce Recommendation systems; Natural Language Processing.
Supervisor
Christie Ezeife
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
The increase in e-commerce activity and research has led to growing demand for dataset recommendation systems that support data-driven decision-making, such as refining pricing strategies and improving customer satisfaction. For instance, a business analyst might need to examine how pricing changes impact customer feedback on electronic products to make future pricing decisions. However, existing such dataset recommendation systems like ZhangRec23, WangRec22, and GDS19 face significant limitations: they often lack a focus on e-commerce datasets, struggle with answering complex queries and rely on inconsistent metadata quality. Accurate metadata (information like titles and descriptions) is essential for retrieving relevant datasets, but searches such as “the impact of seasonal sales on customer reviews for electronics” frequently yield incomplete results. For example, Dataset A may provide seasonal sales data without customer reviews, while Dataset B contains reviews but lacks seasonal data. In this context, ZhangRec23 is primarily designed for biomedical datasets and, when adapted for e-commerce, heavily relies on the quality and completeness of metadata, which can be inconsistent or incomplete in many e-commerce datasets. WangRec22 employs collaborative filtering but fails to handle complex queries adequately, often providing only partial results. GDS19, a keyword-based approach, does not capture the semantic meaning behind queries, resulting in mismatched recommendations. To address these gaps, this thesis proposes the E-commerce Datasets Mining Recommendation System (EDMRec), an enhancement of ZhangRec23 explicitly designed for e-commerce datasets. EDMRec combines content-based filtering, advanced data processing, and machine learning, structured into three layers: Data Collection, Data Processing, and Query Processing. It uses Named Entity Recognition (NER) to enrich incomplete metadata by extracting contextual information and applies Term Frequency-Inverse Document Frequency (TF-IDF) alongside BERT embeddings to capture both keyword relevance and semantic context. This approach enhances recommendation precision, making EDMRec especially suitable for complex e-commerce queries. Experimental evaluations confirm EDMRec’s effectiveness, with a 15\% improvement in precision, recall, and F1 score over existing systems. Tested on over 4,373 metadata entries from Kaggle and Google Dataset Search, EDMRec consistently delivers more relevant, context-aware recommendations, demonstrating its capability to support more insightful analysis and data-driven decision-making in e-commerce datasets.
Recommended Citation
Oduba, Ayomide Elijah, "Enhancing E-commerce Dataset recommendations using BERT and Named Entity Recognition" (2024). Electronic Theses and Dissertations. 9619.
https://scholar.uwindsor.ca/etd/9619