Date of Award


Degree Type


Degree Name



Computer Science

First Advisor

Ngom, Alioune

Second Advisor

Rueda, Luis


chemical fingerprints, drug compound, drug-target interaction network, linear motifs, machine learning, target protein




Drug-target interaction (DTI) prediction is a fundamental step in drug discovery and genomic research and contributes to medical treatment. Various computational methods have been developed to find potential DTIs. Machine learning (ML) has been currently used for new DTIs identification from existing DTI networks. There are mainly two ML-based approaches for DTI network prediction: similarity-based methods and feature-based methods. In this thesis, we propose a feature-based approach, and firstly use short-linear motifs (SLiMs) as descriptors of protein. Additionally, chemical substructure fingerprints are used as features of drug. Moreover, another challenge in this field is the lack of negative data for the training set because most data which can be found in public databases is interaction samples. Many researchers regard unknown drug-target pairs as non-interaction, which is incorrect, and may cause serious consequences. To solve this problem, we introduce a strategy to select reliable negative samples according to the features of positive data. We use the same benchmark datasets as previous research in order to compare with them. After trying three classifiers k nearest neighbours (k-NN), Random Forest (RF) and Support Vector Machine (SVM), we find that the results of k-NN are satisfied but not as excellent as RF and SVM. Compared with existing approaches using the same datasets to solve the same problem, our method performs the best under most circumstance.