Date of Award
2018
Publication Type
Master Thesis
Degree Name
M.Sc.
Department
Computer Science
Supervisor
Rueda, Luis
Supervisor
Ngom, Alioune
Rights
info:eu-repo/semantics/openAccess
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
Studying gene expression through various time intervals of breast cancer survival may provide insights into the recovery of the patients. In this work, we propose a hierarchical clustering method used to separate dissimilar groups of genes in time-series data, which have the furthest distances from the rest of the genes throughout dierent time intervals. The isolated outliers(genes that trend dierently from other genes) can serve as potential biomarkers of breast cancer survivability. We partition the time axis (time points) into bins of length six months starting from 1-6 up to 337-342 month intervals and, for each gene, we average its expression level over all patients who appear in a survival bin. Gene expressions throughout those time points are cubic spline interpolated to create a trending prole for each gene. First, we universally align the gene expression proles to minimize the total area between them. Then, we cluster them using a sliding window approach and hierarchical clustering based on minimum vertical distances. To the best of our knowledge, this work is the rst time-series model that is built on the survival time of patients after the treatment. With this approach, we identied 46 genes (including 24 oncogenes and 18 tumor suppressor genes) as potential biomarkers of breast cancer survivability.
Recommended Citation
Mangalakumar, Naveen, "An Adaptive Clustering Algorithm for Gene Expression Time-Series Data Analysis" (2018). Electronic Theses and Dissertations. 7380.
https://scholar.uwindsor.ca/etd/7380