Date of Award
2017
Publication Type
Master Thesis
Degree Name
M.Sc.
Department
Computer Science
Keywords
breast cancer, gene-expression, microarray, network-based
Supervisor
Rueda, Luis
Supervisor
Ngom, Alioune
Rights
info:eu-repo/semantics/openAccess
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
One of the key challenges of breast cancer research is to predict whether a patient identified with specific subtype or treated with a specific therapy is going to survive or die. Current studies find small subsets of gene biomarkers able to accurately predict the response to therapy. In these studies, the selected genes are not necessarily functionally related, and hence, they may not correctly indicate the molecular mechanism behind breast cancer survivability. Also, several studies have shown there is a very low overlap between the different respective biomarkers subsets for the same cancer disease. To improve the robustness of classification performance and stability of detected biomarkers, recent methods take existing knowledge on relations between genes into account in the classifier, by aggregating functionality related genes to produce discriminative gene subnetworks called network-biomarkers. In this paper, given a breast cancer dataset of patients with different subtypes treated with a given therapy drug, we devised network-based machine learning approach by integrating protein protein interaction network (PPI) with gene expression data (1) to identify the network-biomarkers of breast cancer survivability a) based on subtypes and b) based on therapy and (2) to predict the survivability of breast cancer patients a) based on subtypes b) treated with a therapy drug. We used the concept of seed gene for identification of network-biomarkers with distance 2, 3 and 4 from seed gene protein and our method found distance 3 and $4$ are the distance that gives us best result for identifying survivability of breast cancer patient based on subtype and therapy respectively. To solve the class imbalance problem in some subtypes, we implemented ADASYN. We obtained best classification performance using random forest where the geometric mean, F1-measure and accuracy are respectively 0.867, 0.850 and 87.00% for subtype specific study, and 0.829, 0.807 and 83.77%, for therapy specific.
Recommended Citation
Jubair, Sheikh Abdullah Al, "Identifying Network-Biomarkers of Breast Cancer Survivability" (2017). Electronic Theses and Dissertations. 5992.
https://scholar.uwindsor.ca/etd/5992