Date of Award
2-5-2025
Publication Type
Thesis
Degree Name
M.Sc.
Department
Computer Science
Keywords
clinical score prediction; Deep learning; Longitudina dataset; machine learning; parkinson's disease; supervised task
Supervisor
Ziad Kobti
Rights
info:eu-repo/semantics/openAccess
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
The complexity of longitudinal clinical datasets arises from their temporal nature and high dimensionality across different time points. This inherent nature of such datasets poses a significant challenge in developing effective predictive models. While these datasets offer rich temporal information, their potential remains largely unexplored due to the lack of systematic exploration approaches to data representation and temporal pattern extraction. With the advancements in machine and deep learning, significant studies are leveraging longitudinal datasets. However, current approaches often fail to fully exploit the temporal relationships in longitudinal health records, leading to suboptimal utilization of available information. To address these limitations, we propose a systematic framework that first explores different data representation strategies: wide format preserving temporal features, cross-sectional format capturing independent time points, and long format maintaining temporal sequences. We develop parallel processing pipelines where wide and cross-sectional formats undergo dimensionality reduction through Principal Component Analysis and Nonnegative Matrix Factorization before traditional machine learning application. At the same time, long-format data is processed through Long Short- Term Memory networks. Evaluated on the Parkinson’s disease progression dataset, our approach resulted in a better prediction model predicting patients’ clinical scores for immediate future visits. We achieved a mean absolute error of 1.91 points and an R² value of 0.83, with wide-format data combined with Nonnegative Matrix Factorization and Support Vector Regression yielding optimal results. While validated on Parkinson’s disease data, our comprehensive methodology extends beyond diseasespecific applications, offering a generalizable framework for maximizing the utility of longitudinal clinical datasets.
Recommended Citation
Ahmed, Nourin, "Optimizing Longitudinal Data Representation to Predict Clinical Scores Using an End-to-End Machine Learning Pipeline: A Case Study in Parkinson’s Disease" (2025). Electronic Theses and Dissertations. 9661.
https://scholar.uwindsor.ca/etd/9661