Keywords
SuperLearner, Ensemble, Prostate Canser
Abstract
The objective of this major paper is to apply an ensemble method known as the SuperLearner algorithm to find the best ten gene expressions that can predict a high PSA (>7) versus low PSA in prostate Cancer patients. We try to formulate techniques such the penalized logistic regressions, random forest and cross-validation in a way that is consistent with the formulation of the SupeLearner algorithm. To discover a patch of ten genes that can predict well the PDA level, we sample random patches of 10 genes from a pool of almost 47,000 genes and apply the superlearner algorithm to compute a cross-validated AUC. Consequently we choose groups of ten genes that have AUC exceeding 0.65. This exercise shows that many un-classified genes are correlated with high PSA.
Primary Advisor
Dr. Hussein
Program Reader
Dr. Hlynka
Degree Name
Master of Science
Department
Mathematics and Statistics
Document Type
Major Research Paper
Convocation Year
2019