Date of Award
10-30-2020
Publication Type
Master Thesis
Degree Name
M.Sc.
Department
Computer Science
Keywords
ChIP-Seq, Cluster Validity Indices, Optimal Multi-level Thresholding, Pattern Recognition, Peak Calling, Protein Binding Sites
Supervisor
Luis Rueda
Rights
info:eu-repo/semantics/openAccess
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
Chromatin immunoprecipitation (ChIP-Seq) has emerged as a superior alternative to microarray technology as it provides higher resolution, less noise, greater coverage and wider dynamic range. While ChIP-Seq enables probing of DNA-protein interaction over the entire genome, it requires the use of sophisticated tools to recognize hidden patterns and extract meaningful data. Over the years, various attempts have resulted in several algorithms making use of different heuristics to accurately determine individual peaks corresponding to unique DNA-protein binding sites. However, finding all the binding sites with high accuracy in a reasonable time is still a challenge. In this work, we propose the use of Multi-level thresholding algorithm, which we call LinMLTBS, used to identify the enriched regions on ChIP-Seq data. Although various suboptimal heuristics have been proposed for multi-level thresholding, we emphasize on the use of an algorithm capable of obtaining an optimal solution, while maintaining linear-time complexity. Testing various algorithm on various ENCODE project datasets shows that our approach attains higher accuracy relative to previously proposed peak finders while retaining a reasonable processing speed.
Recommended Citation
Naik, Musab Mushtaque, "Finding Binding Sites in ChIP-Seq Data via a Linear-time Multi-level Thresholding Algorithm" (2020). Electronic Theses and Dissertations. 8463.
https://scholar.uwindsor.ca/etd/8463