Date of Award


Publication Type

Master Thesis

Degree Name



Computer Science

First Advisor

Alioune Ngom


breast cancer, clustering, drug repositioning, drug repurposing, machine learning, pharmacology




Breast cancer makes up 25 percent of all new cancer diagnoses globally according to the American Cancer Society(ACS). Developing a highly effective drug can be a time consuming and an expensive ordeal. Drug repurposing is a tremendous approach which takes away some disadvantages of traditional drug development procedures making it both time and cost effective. In this thesis, we are interested in finding good drugs for each of the ten subtypes of breast cancer. Repurposing incorporates identifying unique indications of pre-approved drugs and utilizing them to observe the anti-correlation between the perturbation data and disease data. If anti-correlation, whether it is up-regulation or down-regulation, is detected, it indicates that those drugs cause an effect making them a suitable candidate for drug repurposing. The gene expression data and the discrete copy number variation data will be used to compute z-scores and normalize the data for ten sets of disease subtypes. Gene expression data for ten subtypes was extracted from the METABRIC dataset. We have extracted values corresponding to MCF7 cell line from the pharmacogenomics perturbation data which is the National Institute of Health's (NIH) Library of Integrated Network-Based Cellular Signatures (LINCS) dataset. We have used our proposed clustering methods to select the best suited drug candidates per subtype. We have obtained a ranked list of suitable drug repurposing and repositioning candidates for each of the 10 breast cancer subtypes.