COLI: Collaborative clustering missing data imputation

Document Type

Article

Publication Date

12-1-2021

Publication Title

Pattern Recognition Letters

Volume

152

First Page

420

Keywords

Collaborative clustering, Data amputation, Missing data imputation

Last Page

427

Abstract

Missing data imputation plays an important role in the data cleansing process. Clustering algorithms have been widely used for missing data imputation, yet, there is little research done on the use of clustering ensemble for missing data imputation, which aggregates multiple clustering results. This paper proposes a novel collaborative clustering-based imputation method, called COLI, which uses the imputation quality as a key criterion for the exchange of information between different clustering results. To the best of our knowledge, this is the first study on the impact of collaborative clustering on imputation performance. The main contributions of this paper are three-fold. A novel missing value imputation based on collaborative clustering is proposed, three amputation strategies are used to induce missingness on various complete and publicly available datasets with different mechanisms, distributions, and ratios, which allows evaluating the imputation quality of the proposed method in estimating missing values of various numerical datasets with different missingness mechanisms, distributions, and ratios. The proposed method is compared to several state-of-the-art imputation methods and attained results demonstrate that the proposed method is an effective method for handling missing data.

DOI

10.1016/j.patrec.2021.11.011

ISSN

01678655

Share

COinS