Date of Award
CC BY-NC-ND 4.0
In data mining, clustering analysis is an important research area. The goal of clustering is to group the objects in a data set into meaningful subclasses. Many algorithms have been designed for numerical data clustering and categorical data clustering respectively. However, very few people paid attention to the clustering problem of mixed-type data set which includes data objects that are of both numerical and categorical attributes. This thesis proposes an approach to the solution of this problem. The method is called CCEM-KNN which stands for Categorical data Clustering approach with Expectation Maximization and K-Nearest Neighbour. First, we apply a categorical clustering method over the categorical attributes of the whole data objects to get an initial partition. Then, we apply Expectation-Maximization classification algorithm based on this partition over the numerical attributes of each cluster to create a sample data set. Finally, we apply another classification algorithm K-Nearest Neighbour to perform classification which is based on the sample data set we created. In this way, we finally solve the mixed-type clustering problem. Experiment show that CCEM-KNN performs better than previous work and can also handle large data set well.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .L58. Source: Masters Abstracts International, Volume: 42-03, page: 0968. Adviser: Alioune Ngom. Thesis (M.Sc.)--University of Windsor (Canada), 2003.
Liu, Yu., "A categorical data clustering approach with expectation maximization and K-nearest neighbour." (2003). Electronic Theses and Dissertations. 523.