Date of Award

3-10-2021

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Dimensionality Reduction, Dynamic Programming, k-means clustering, Kernel Principal Component Analysis, Optimal Clustering

Supervisor

Luis Rueda

Rights

info:eu-repo/semantics/openAccess

Abstract

Several techniques are used for clustering of high-dimensional data. Traditionally, clustering approaches are based on performing dimensionality reduction of high-dimensional data followed by classical clustering such as k-means in lower dimensions. However, this approach based on k-means does not guarantee optimality. Moreover, the result of k-means is highly dependent on initialization of cluster centers and hence not repeatable, while not being optimal. To overcome this drawback, an optimal clustering approach in one dimension based on dimensionality reduction is proposed. The one-dimensional representation of high dimensional data is obtained using Kernel Principal Component Analysis. The one-dimensional representation of the data is then clustered optimally using a dynamic programming algorithm in polynomial time. Clusters in the one-dimensional data are obtained by minimizing the sum of within-class variance while maximizing the sum of between-class variance. The advantage of the proposed approach is demonstrated on synthetic and real-life datasets over standard k-means in terms of optimality and repeatability.

Share

COinS