Date of Award
CC BY-NC-ND 4.0
A data warehousing system is a single data repository, which integrates already existing information from different data sources belonging to an enterprise over a long time period. One of the main tasks in building a data warehouse is to ensure that data drawn from several data sources contain no structural and semantic conflicts before being loaded into the data warehouse. Representing the same real world object in numerous ways is just one form of data disparity (dirt) to be resolved in a data warehouse. Data cleaning is a complex process, which uses multidisciplinary techniques to remove all the conflicts inherent in warehouse data. This thesis proposes two data cleaning algorithms. The first algorithm, designed for initial data warehouse cleaning, uses the token keys composed from record fields for comparison of records. The second algorithm is designed to subsequently clean an existing data warehouse in a timely fashion. The algorithms achieve optimal cleaning correctness in a good time. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2002 .O33. Source: Masters Abstracts International, Volume: 41-04, page: 1116. Thesis (M.Sc.)--University of Windsor (Canada), 2002.
Ohanekwu, Timothy Emenike., "A pre and post data warehouse cleaning technique." (2002). Electronic Theses and Dissertations. 705.