Date of Award

2-1-2022

Publication Type

Thesis

Degree Name

M.A.Sc.

Department

Computer Science

Keywords

Classification, Data integration, Multi-omics data, Residual neural network

Supervisor

A. Alkhateeb

Supervisor

L. Rueda

Rights

info:eu-repo/semantics/openAccess

Abstract

This work builds a prediction model for multi-omics breast cancer Nottingham Prognostics Index (NPI) classes. Rapid development in next-generation sequencing led to the ability to measure different biological indicators called multi-omics data. The availability of multi-omics data sparked the challenge of integrating and analyzing these various biological measures to understand the progression of the diseases. High-dimensional embedding techniques are used to present the features in the lower dimension, that is a 2-dimensional map. This thesis presents a supervised learning method used to predict breast cancer NPI. The objectives of this research are (i) build a diagnosis system for breast cancer NPI based on multi-omics data; (ii) find gene biomarkers for each NPI class; (iii) build a novel prediction model based on t-distributed stochastic neighbor embedding (t-SNE) and residual neural network (ResNet) to integrate multi-omics data in the classification mechanism.

The dataset consists of three omics: gene expression, CNA and mRNA. We evaluated four models combining two embedding techniques, t-SNE and SOM, with two different deep learning models, VGG and ResNet. The result showed that t-SNE combined with ResNet in the concatenated approach outperformed the other methods with an accuracy of 98.48%. The set of genes extracted from the three omics can serve as potential NPI associative biomarkers. The findings in the literature confirm the associations between some of these genes and breast cancer prognosis and survival.

Share

COinS