Date of Award

10-30-2020

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

First Advisor

Luis Rueda

Keywords

ChIP-Seq, Cluster Validity Indices, Optimal Multi-level Thresholding, Pattern Recognition, Peak Calling, Protein Binding Sites

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Abstract

Chromatin immunoprecipitation (ChIP-Seq) has emerged as a superior alternative to microarray technology as it provides higher resolution, less noise, greater coverage and wider dynamic range. While ChIP-Seq enables probing of DNA-protein interaction over the entire genome, it requires the use of sophisticated tools to recognize hidden patterns and extract meaningful data. Over the years, various attempts have resulted in several algorithms making use of different heuristics to accurately determine individual peaks corresponding to unique DNA-protein binding sites. However, finding all the binding sites with high accuracy in a reasonable time is still a challenge. In this work, we propose the use of Multi-level thresholding algorithm, which we call LinMLTBS, used to identify the enriched regions on ChIP-Seq data. Although various suboptimal heuristics have been proposed for multi-level thresholding, we emphasize on the use of an algorithm capable of obtaining an optimal solution, while maintaining linear-time complexity. Testing various algorithm on various ENCODE project datasets shows that our approach attains higher accuracy relative to previously proposed peak finders while retaining a reasonable processing speed.

Share

COinS