Date of Award
5-16-2024
Publication Type
Thesis
Degree Name
M.Sc.
Department
Computer Science
Keywords
Biomarker;Graph Convolutional Network;Natural Language Processing
Supervisor
Ziad Kobti
Abstract
A biomarker identification model, integrating natural language processing (NLP) and graph convolutional neural network (GCN), offers a novel approach to address the limitations of a simple neural network's ability to capture the contextual semantics of genes, extract spatial feature information and understand nonlinear complex semantic relations of genes. First, we explore microarray datasets to identify differentially expressed genes (DEGs) and construct a high-confidence protein-protein interaction (PPI) network. By employing Word2Vec, an NLP algorithm, for preprocessing and vectorizing gene ontology (GO) annotations, our model reveals complex biological relationships among genes, enriching our understanding of disease pathogenesis. GO annotations are crucial as they provide comprehensive information about gene functions, biological processes, and cellular components, thus augmenting our understanding of how genes interact within the network. Integrating multi-layered GCNs facilitates effective learning of complex semantic relations and spatial feature information within the PPI network. Experiments on publicly available datasets of Glioblastoma Multiforme (GBM), the most aggressive form of brain tumour, demonstrate that our model significantly enhances biomarker identification compared to existing state-of-the-art methods, showcasing its potential for advancing disease research and clinical decision-making. Survival analysis to explore the relationship between the expression levels of identified biomarkers and GBM patient outcomes further validates our findings. Our study underscores the importance of integrating advanced computational techniques to comprehensively analyze complex diseases like GBM, offering promising avenues for biomarker discovery and therapeutic development.
Recommended Citation
Ferdoush, Zannatul, "A biomarker identification model from protein protein interaction network using natural language processing and graph convolutional network" (2024). Electronic Theses and Dissertations. 9469.
https://scholar.uwindsor.ca/etd/9469