Electronic Theses and Dissertations

DeePSLiM: A Deep Learning Approach to Identify Predictive Short-linear Motifs for Protein Sequence Classification

Alexandru Filip, University of Windsor

Date of Award

3-10-2021

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Machine Learning, Motif Discovery, Neural Network, Proteins, Short Linear Motif

Supervisor

Luis Rueda

Supervisor

Alioune Ngom

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

With the increasing quantity of biological data, it is important to develop algorithms that can quickly find patterns in large databases of DNA, RNA and protein sequences. Previous research has been very successful at applying deep learning methods to the problems of motif detection as well as classification of biological sequences. There are, however, limitations to these approaches. Most are limited to finding motifs of a single length. In addition, most research has focused on DNA and RNA, both of which use a four letter alphabet. A few of these have attempted to apply deep learning methods on the larger, twenty letter, alphabet of proteins. We present an enhanced deep learning model, called DeePSLiM, capable of detecting predictive, short linear motifs (SLiM) in protein sequences. The model is a shallow network that can be trained quickly on large amounts of data. The SLiMs are predictive because they can be used to classify the sequences into their respective families. The model was able to reach scores of 94.5% on accuracy, precision, recall, F1-Score and Matthews-correlation coefficient, as well as 99.9% area under the receiver operator characteristic curve (AUROC).

Recommended Citation

Filip, Alexandru, "DeePSLiM: A Deep Learning Approach to Identify Predictive Short-linear Motifs for Protein Sequence Classification" (2021). Electronic Theses and Dissertations. 8553.
https://scholar.uwindsor.ca/etd/8553

Supplementary Material.pdf (2518 kB)
Supplementary Material

Download

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

DeePSLiM: A Deep Learning Approach to Identify Predictive Short-linear Motifs for Protein Sequence Classification

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

DeePSLiM: A Deep Learning Approach to Identify Predictive Short-linear Motifs for Protein Sequence Classification

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Share

Search

Browse

Author Corner