Date of Award

2014

Degree Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

First Advisor

Luis Rueda

Second Advisor

Lisa Porter

Keywords

breast cancer, gene expression, subtype

Rights

CC-BY-NC-ND

Abstract

World wide, one in nine women is diagnosed with breast cancer in her lifetime and breast cancer is the second leading cause of death among women. Accurate diagnosis of the specific subtypes of this disease is vital to ensure that the patients will have the best possible response to therapy. In this thesis, we use different machine learning techniques to select the most informative biomarkers for the recently proposed ten subtypes of breast cancer. Unlike existing gene selection approaches, we use a hierarchical based classification approach that selects genes and builds the classifier concurrently in a top-down fashion. We also propose a new bottom-up hierarchical approach to obtain the most informative genes for different subtypes, while we identify the similarity level between these subtypes. Our results support that this modified approach to gene selection yields a small subset of genes that can predict each of these ten subtypes with very high accuracy. The bottom-up approach, on the other hand, provides an insightful structure for further analysis of these subtypes.

Share

COinS