Date of Award


Publication Type

Master Thesis

Degree Name



Computer Science


Computer Science.



Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.


A data warehousing system is a single data repository, which integrates already existing information from different data sources belonging to an enterprise over a long time period. One of the main tasks in building a data warehouse is to ensure that data drawn from several data sources contain no structural and semantic conflicts before being loaded into the data warehouse. Representing the same real world object in numerous ways is just one form of data disparity (dirt) to be resolved in a data warehouse. Data cleaning is a complex process, which uses multidisciplinary techniques to remove all the conflicts inherent in warehouse data. This thesis proposes two data cleaning algorithms. The first algorithm, designed for initial data warehouse cleaning, uses the token keys composed from record fields for comparison of records. The second algorithm is designed to subsequently clean an existing data warehouse in a timely fashion. The algorithms achieve optimal cleaning correctness in a good time. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2002 .O33. Source: Masters Abstracts International, Volume: 41-04, page: 1116. Thesis (M.Sc.)--University of Windsor (Canada), 2002.