Date of Award
5-17-2024
Publication Type
Dissertation
Degree Name
Ph.D.
Department
Computer Science
Keywords
Bigdata;Block as a Value;Document-Oriented NoSQL Databases;E-Commerce;Recommendation Systems;Sequential Pattern Mining
Supervisor
Christie Ezeife
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
To date, majority of large corporations such as Amazon and Facebook still have their core solutions (e.g., payments) on relational databases but only use non-relational Bigdata (i.e., NoSQL) database management systems for their non-core systems (e.g., shopping cart) that favor availability and scalability through partitioning while trading off consistency. NoSQL systems are built based on the CAP (Consistency, Availability and Partitioning) database theorem, which satisfies two of these features while trading off one. The need for systems availability and scalability drives the use of NoSQL models, while the lack of consistency and robust query engines as obtainable in relational databases impede their usage. To mitigate these drawbacks, researchers and companies like Amazon, Google and Facebook developed 'SQL over NoSQL' systems such as Amazon’s Dynamo, Google's Spanner, Facebook’s Memcache, Zidian2019, Apache Hive and SparkSQL. These systems create an SQL-like query engine layer over NoSQL systems but suffer from data redundancy due to processing of unnormalized NoSQL database (e.g., Document) which lack consistency obtainable in relational databases. Their query engine is also not relationally complete because they cannot process all relational algebra-based queries as obtainable in a relational database. This thesis presents a ‘NoSQL over SQL system’, an inverse of existing ‘SQL over NoSQL’ Big data processing approaches such as Zidian2019 that transforms data into a key-value format then builds an SQL query engine layer on the NoSQL data. Thesis approach is motivated by (i) the need for existing systems to fully deploy NoSQL data store functionalities without the limitation of building an extra SQL layer for querying, and (ii) the ability to integrate images similarities into the ecommerce mining process by taking advantage of the ease of retrieval and storage of storing images as text on document-oriented NoSQL databases. To allow appropriate storage and retrieval of data on document-based NoSQL databases without data redundancy and inconsistency while encouraging both horizontal and vertical partitioning, this work proposes NoSQL over SQL Block as a Value (BaaV) data storage strategy. Unlike relational database model where a relation is represented as R(k,A_1,A_2,…,A_n), with a key attribute k= k_1,k_2,…,k_n and k_i is the primary key to the set of attributes A_i,i=1,2,…,n of the relation, in our NoSQL BaaV model (represented as a tuple (K,B) where K is the key attribute and B is a block of relations). NoSQL BaaV represent a relation as R(K,r_1,r_2,…,r_m), with a key attribute K and a set of n relations (i.e., r) called blocks B and each r∈B contains a set of its own attributes and is denoted as r(k,A_1,A_2,…,A_p), with a key attribute k and a set of p attributes typical to a relational model. The relations r_1,r_2,…,r_p in R of NoSQL BaaV database are related through foreign key relationships. Thesis also solves data inconsistency problem of existing NoSQL-based stores using NoSQL BaaV model and by using a leader node strategy in the NoSQL stores cluster for read/write operations while retaining an in-sync replica node similar to Apache Kafka data replication strategy. Additionally, we vectorized items image and integrate item-item image similarity scores into e-commerce customer historical purchase database to enhance sequential pattern recommendation on e-commerce with a proposed Image Enhanced Historical Sequential Pattern Recommendation (iHSPRec) system. To enhance accurate pattern mining on NoSQL databases for adequate recommendation and allow existing corporations with a large relational database to take advantage of NoSQL databases, this thesis (i) proposes a Block as a Value (BaaV) framework for extracting data and mapping from NoSQL into relational schema to enable faster data retrieval for existing large relational databases, (ii) Integrate item-item image similarity scores into customers purchase history for enhance sequential pattern recommendation by using items images stored on document-based NoSQL database (iii) propose a sequential pattern mining technique on NoSQL BaaV document-oriented database. Using existing benchmark systems of ‘SQL over NoSQL’, relational databases and real-life datasets for our experiments, we demonstrated that our NoSQL over SQL system outperforms existing relational databases, SQL over NoSQL systems and is novel in ensuring data consistency, scalability, query execution and improving data storage and retrieval in large database systems without data loss and enhancing improved performance on NoSQL database.
Recommended Citation
Gidado, Abdulrauf Aremu, "Mining for Product Recommendation on Document-Based NoSQL Big Data" (2024). Electronic Theses and Dissertations. 9416.
https://scholar.uwindsor.ca/etd/9416