Date of Award

2014

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

First Advisor

Rueda, Luis

Second Advisor

Ngom, Alioune

Keywords

Biological sciences, DNA microarrays, Constrained multi-level thresholding

Rights

CC BY-NC-ND 4.0

Abstract

We proposed a family of methods for transcriptomics and genomics data analysis based on multi-level thresholding approach, such as OMTG for sub-grid and spot detection in DNA microarrays, and OMT for detecting significant regions based on next generation sequencing data. Extensive experiments on real-life datasets and a comparison to other methods show that the proposed methods perform these tasks fully automatically and with a very high degree of accuracy. Moreover, unlike previous methods, the proposed approaches can be used in various types of transcriptome analysis problems such as microarray image gridding with different resolutions and spot sizes as well as finding the interacting regions of DNA with a protein of interest using ChIP-Seq data without any need for parameter adjustment. We also developed constrained multi-level thresholding (CMT), an algorithm used to detect enriched regions on ChIP-Seq data with the ability of targeting regions within a specific range. We show that CMT has higher accuracy in detecting enriched regions (peaks) by objectively assessing its performance relative to other previously proposed peak finders. This is shown by testing three algorithms on the well-known FoxA1 Data set, four transcription factors (with a total of six antibodies) for Drosophila melanogaster and the H3K4ac antibody dataset. Finally, we propose a tree-based approach that conducts gene selection and builds a classifier simultaneously, in order to select the minimal number of genes that would reliably predict a given breast cancer subtype. Our results support that this modified approach to gene selection yields a small subset of genes that can predict subtypes with greater than 95%overall accuracy. In addition to providing a valuable list of targets for diagnostic purposes, the gene ontologies of the selected genes suggest that these methods have isolated a number of potential genes involved in breast cancer biology, etiology and potentially novel therapeutics.

Share

COinS