Date of Award
Dimensionality Reduction, Dynamic Programming, k-means clustering, Kernel Principal Component Analysis, Optimal Clustering
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Several techniques are used for clustering of high-dimensional data. Traditionally, clustering approaches are based on performing dimensionality reduction of high-dimensional data followed by classical clustering such as k-means in lower dimensions. However, this approach based on k-means does not guarantee optimality. Moreover, the result of k-means is highly dependent on initialization of cluster centers and hence not repeatable, while not being optimal. To overcome this drawback, an optimal clustering approach in one dimension based on dimensionality reduction is proposed. The one-dimensional representation of high dimensional data is obtained using Kernel Principal Component Analysis. The one-dimensional representation of the data is then clustered optimally using a dynamic programming algorithm in polynomial time. Clusters in the one-dimensional data are obtained by minimizing the sum of within-class variance while maximizing the sum of between-class variance. The advantage of the proposed approach is demonstrated on synthetic and real-life datasets over standard k-means in terms of optimality and repeatability.
Bhide, Nachiket, "Efficient Clustering via Kernel Principal Component Analysis and Optimal One-dimensional Thresholding" (2021). Electronic Theses and Dissertations. 8548.