SuperLearner, Ensemble, Prostate Canser
The objective of this major paper is to apply an ensemble method known as the SuperLearner algorithm to find the best ten gene expressions that can predict a high PSA (>7) versus low PSA in prostate Cancer patients. We try to formulate techniques such the penalized logistic regressions, random forest and cross-validation in a way that is consistent with the formulation of the SupeLearner algorithm. To discover a patch of ten genes that can predict well the PDA level, we sample random patches of 10 genes from a pool of almost 47,000 genes and apply the superlearner algorithm to compute a cross-validated AUC. Consequently we choose groups of ten genes that have AUC exceeding 0.65. This exercise shows that many un-classified genes are correlated with high PSA.
Master of Science
Mathematics and Statistics
Major Research Paper