Date of Award

6-19-2024

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Graph;Natural Language Processing;Semantic similarity;Word Embedding;Word Similarity;Word vector

Supervisor

Ziad Kobti

Abstract

In the aspect of information storage, text assumes a central role, necessitating streamlined and effective methods for swift retrieval. Among various text representations, the vector form stands out for its remarkable efficiency, especially when dealing with large datasets. Arranging words that are similar in meaning close to each other in the vectorized representation helps improve system performance in different Natural Language Processing (NLP) tasks. Previous methods, primarily centered on capturing word context through neural language models, have fallen short in delivering high scores for word similarity problems. This thesis investigates the connection between representing words in vector form and the improved performance and accuracy observed in NLP tasks. It introduces a method to represent words as a graph so that their first-order and second-order proximity are preserved, aiming to enhance overall capabilities in semantic representation. Experimental deployment of this technique across diverse text corpora underscores its superiority over conventional word embedding approaches. This method of word representation outperforms traditional word-embedding methods by 2.7 % in multiple intrinsic and extrinsic tasks. The findings contribute to the evolving landscape of semantic representation learning but also illuminate their implications for text classification tasks, especially within the context of dynamic embedding models.

Share

COinS