Date of Award

2024

Publication Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

Supervisor

Alioune Ngom

Supervisor

Jianguo Lu

Abstract

Representation learning is a key step in bridging machine learning and drug discovery. Understanding the interactions between drugs and various biological entities is critical for drug discovery. In this research, we explore advanced representation learning methods to enhance the accuracy of predicting such associations. We study and devise different embedding methods for molecular representation using Large Language Models (LLMs) in order to capture intricate chemical properties and relationships between drugs and other entities. Additionally, we utilize graph representation learning (GRL) methods to model and predict interactions between drugs and other biomolecular entities, leveraging their capability to process complex graph-structured data. By integrating these techniques, our research aims to provide a comprehensive and robust framework not only for association prediction tasks but also for a thorough examination of molecular string embedding techniques. For drug repurposing tasks using heterogeneous graphs, we developed NMF-DR, a non-negative matrix factorization method, and DR-HGNN, a heterogeneous graph neural network method, to predict candidate disease indications for existing drugs. For homogeneous graphs, we created DDI-Pred, which uses molecular embedding and graph convolutional networks to predict new drug-drug interactions. Additionally, following the success of integrating molecular embedding for association prediction tasks, we explored the performance of LLMs for molecular representation and developed an evaluation toolkit to assess molecular embedding methods using LLMs for molecular properties and DDI prediction.

Share

COinS