Date of Award


Publication Type

Master Thesis

Degree Name



Computer Science


Biological sciences, Applied sciences, SVM, LDR, K-NN, Cross dataset validation, Classification, Feature generation, Leave-one-out, Protein-protein interaction, Sequence information


Luis Rueda




Protein-protein interactions (PPIs) play a key role in many biological processes and functions in living cells. Hence, identification, prediction, and analysis of PPIs are important problems in molecular biology. Traditional solutions (laboratory based experiments) to this problem are labor intensive and time consuming. As a result, the demand of a computational model to solve this problem is increasing day by day. In this thesis, I propose a computational model to predict biological PPI types using short, linear motifs (SLiMs). The information contained in a protein sequence is retrieved using the profiles of SLiMs. I use sequence information as a distinguishing property between interactions types, mainly obligate and non-obligate. I also propose another model to predict PPIs using desolvation and electrostatic energies. These computational models use the information contained in the sequence, and desolvation and electrostatic energies of the protein complex as properties. After computing all the properties, the well-known classifiers, k -nearest neighbor (k -NN), support vector machine (SVM) and linear dimensionality reduction (LDR) have been implemented. Results on two well-known datasets confirm the accuracy of the models, which is above 99%. Analysis and comparison of the results show that the information contained in the sequence is very important for prediction and analysis of protein-protein interactions.