Date of Award

2-5-2025

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

clinical score prediction; Deep learning; Longitudina dataset; machine learning; parkinson's disease; supervised task

Supervisor

Ziad Kobti

Rights

info:eu-repo/semantics/openAccess

Abstract

The complexity of longitudinal clinical datasets arises from their temporal nature and high dimensionality across different time points. This inherent nature of such datasets poses a significant challenge in developing effective predictive models. While these datasets offer rich temporal information, their potential remains largely unexplored due to the lack of systematic exploration approaches to data representation and temporal pattern extraction. With the advancements in machine and deep learning, significant studies are leveraging longitudinal datasets. However, current approaches often fail to fully exploit the temporal relationships in longitudinal health records, leading to suboptimal utilization of available information. To address these limitations, we propose a systematic framework that first explores different data representation strategies: wide format preserving temporal features, cross-sectional format capturing independent time points, and long format maintaining temporal sequences. We develop parallel processing pipelines where wide and cross-sectional formats undergo dimensionality reduction through Principal Component Analysis and Nonnegative Matrix Factorization before traditional machine learning application. At the same time, long-format data is processed through Long Short- Term Memory networks. Evaluated on the Parkinson’s disease progression dataset, our approach resulted in a better prediction model predicting patients’ clinical scores for immediate future visits. We achieved a mean absolute error of 1.91 points and an R² value of 0.83, with wide-format data combined with Nonnegative Matrix Factorization and Support Vector Regression yielding optimal results. While validated on Parkinson’s disease data, our comprehensive methodology extends beyond diseasespecific applications, offering a generalizable framework for maximizing the utility of longitudinal clinical datasets.

Included in

Biology Commons

Share

COinS