A new clustering method using wavelet based probability density functions for identifying patterns in time-series data

Document Type

Conference Proceeding

Publication Date


Publication Title

2016 IEEE EMBS International Student Conference: Expanding the Boundaries of Biomedical Engineering and Healthcare, ISC 2016 - Proceedings


clustering, multi level thresholding, probability density function, prostate cancer


Clustering is a prominent method to identify similar patterns in large groups of data and can be beneficial in the bioinformatics studies due to this property. Classical methods such as k-means and maximum likelihood consider a mixture of Gaussian probability density function (PDF) of data and find clusters based on maximizing the PDF. However, correlation among different groups of data and existence of noise on the data make it difficult to correctly detect the correct number of clusters. Furthermore, the assumption of the Gaussian distance for the PDF is not necessarily true in real applications. This paper presents a new clustering method via wavelet-based probability density functions. For this purpose, first, a mixture of PDFs is estimated by the wavelet for each feature. After this, a multilevel thresholding method is implemented on the mixture of PDFs of each feature to obtain the clusters. Finally, a forward feature selection with memory is used to cluster the dataset based on combinations of the features. The profile alignment and agglomerative clustering (PAAC) index is applied for evaluating the number of clusters and features. Transcript expression throughout the various stages of prostate cancer is considered as a case study to identify patterns. The experimental results show the ability of the proposed method in detecting patterns of similar transcripts throughout disease progression. The results are promising in comparison with the other methods.