Date of Award

9-25-2024

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Drug Combination;Large Language Models;Polypharmacy Side Effect;SMILES

Supervisor

Alioune Ngom

Abstract

Polypharmacy, the concurrent use of multiple drugs, is a common strategy for treating patients with complex diseases or various conditions. Although consuming a combination of drugs can be beneficial in some cases, it can also lead to unintended drug-drug interactions (DDI) and increased risk of adverse side effects. Predicting these adverse side effects can significantly assist clinicians. In this study, we assess the impact of different language models on generating embeddings for the text-representation of drugs, specifically Simplified Molecular Input Line-Entry System (SMILES), to predict polypharmacy side effects. We first retrieve SMILES sequences of drugs from the PubChem database and then encode these strings using various models, such as ChemBERTa, GPT, BERT, Mol2vec, to obtain representation for each drug. These representations are then fused to create a representation for each drug pair. The drug pair representations are then input into two distinct models separately: a Multilayer Perceptron (MLP), and a Graph Neural Network (GNN), to predict polypharmacy side effects. Our evaluation shows that using these language models with the MLP and GNN results in improved performance compared to our baseline studies. Notably, integrating the embeddings of Fine-tuned ChemBERTa with the GNN architecture yields more effective results than other methods. This study highlights the effectiveness of using complex models like Language Models to generate feature representations based solely on the chemical structures of drugs, even without incorporating other entities such as proteins or cell lines.

Share

COinS