Major Papers


SuperLearner, Ensemble, Prostate Canser


The objective of this major paper is to apply an ensemble method known as the SuperLearner algorithm to find the best ten gene expressions that can predict a high PSA (>7) versus low PSA in prostate Cancer patients. We try to formulate techniques such the penalized logistic regressions, random forest and cross-validation in a way that is consistent with the formulation of the SupeLearner algorithm. To discover a patch of ten genes that can predict well the PDA level, we sample random patches of 10 genes from a pool of almost 47,000 genes and apply the superlearner algorithm to compute a cross-validated AUC. Consequently we choose groups of ten genes that have AUC exceeding 0.65. This exercise shows that many un-classified genes are correlated with high PSA.

Primary Advisor

Dr. Hussein

Program Reader

Dr. Hlynka

Degree Name

Master of Science


Mathematics and Statistics

Document Type

Major Research Paper

Convocation Year