Computer Science Publications

Data position and profiling in domain-independent warehouse cleaning

C. I. Ezeife, University of WindsorFollow
A. O. Udechukwu

Document Type

Conference Paper

Publication Date

2003

Publication Title

5th International Conference on Enterprise Information Systems, ICEIS 2003

Volume

First Page

232

Last Page

238

Abstract

A major problem that arises from integrating different databases is the existence of duplicates. Data cleaning is the process for identifying two or more records within the database, which represent the same real world object (duplicates), so that a unique representation for each object is adopted. Existing data cleaning techniques rely heavily on full or partial domain knowledge. This paper proposes a positional algorithm that achieves domain independent de-duplication at the attribute level. The paper also proposes a technique for field weighting through data profiling, which, when used with the positional algorithm, achieves domain-independent cleaning at the record level. Experiments show that the positional algorithm achieves more accurate de-duplication than existing algorithms.

Recommended Citation

Ezeife, C. I. and Udechukwu, A. O.. (2003). Data position and profiling in domain-independent warehouse cleaning. 5th International Conference on Enterprise Information Systems, ICEIS 2003, 1, 232-238.
https://scholar.uwindsor.ca/computersciencepub/23

Link to Full Text

Find in your library

COinS

Scholarship at UWindsor

Computer Science Publications

Data position and profiling in domain-independent warehouse cleaning

Document Type

Publication Date

Publication Title

Volume

First Page

Last Page

Abstract

Recommended Citation

Search

Browse

Author Corner

Links

Scholarship at UWindsor

Computer Science Publications

Data position and profiling in domain-independent warehouse cleaning

Authors

Document Type

Publication Date

Publication Title

Volume

First Page

Last Page

Abstract

Recommended Citation

Share

Search

Browse

Author Corner

Links