Date of Award


Publication Type

Master Thesis

Degree Name



Computer Science

First Advisor

Ngom, Alioune

Second Advisor

Rueda, Luis


Calmodulin Binding Proteins, MEME, Prediction, Protein-protein interactions, Short-linear motifs, WEKA




Prediction of protein-protein interactions (PPIs) is a difficult and important problem in biology. Although high-throughput technologies have made remarkable progress, the predictions are often inaccurate and include high rates of both false positives and false negatives. In addition, prediction of Calmodulin Binding Proteins (CaM-binding) is a problem that has been investigated deeply, though computational approaches for their prediction are not well developed. Short-linear motifs (SLiMs), on the other hand, are being effectively used as features for analyzing PPIs, though their properties have not been used in highthroughput interactions. We propose a new method for prediction of high-throughput PPIs and CaM binding proteins based on counting SLiMs in protein sequences with specific scoring functions. The method has been tested on a positive dataset of 50 protein pairs obtained from the PrePPI database, and a negative dataset of 38 protein pairs obtained from the Negatome-PDB 2.0 database, and 387 proteins from the CaM database. We have used Multiple EM for Motif Elucidation (MEME) to obtain motifs for each of the positive and negative datasets. Our method shows promising results and demonstrates that information contained in SLiMs is highly relevant for accurate prediction of high-throughput PPIs and CaM-binding proteins. In addition to efficient prediction, individual SLiMs bring extra information on patterns that may be linked to specific roles in protein function.