Date of Award

Summer 2021

Publication Type

Thesis

Degree Name

M.A.Sc.

Department

Computer Science

Keywords

GloVe, Word co-occurrence, Word embedding, Word2Vec

Supervisor

J. Lu

Supervisor

J. Chen

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

One of the trends in Natural Language Processing (NLP) is the use of word embedding. Its aim is to build a low dimensional vector representation of words from text corpora. Global Vectors for Word Representation (GloVe) and Sikp-Gram with Negative Sampling (SGNS) are two representative word embedding methods. Existing papers have different conclusions on the performance of these two methods. This thesis focuses on GloVe and studies its commonalities and differences with SGNS.

Word co-occurrence is the cornerstone of all word embedding algorithms. One difference between GloVe and SGNS is the definition of co-occurrence. The weight of co-occurring words tapers o↵ with the distance between them. GloVe and SGNS adopts different weighting schemes. In SGNS, weight decreases linearly with the distance. In GloVe, the weight decreases harmonically, giving less weight to the words in the center of the window. We propose GloVe-L (GloVe Linear), by changing the weighting scheme to the linear weighting. We find that GloVe-L outperforms GloVe consistently in word similarity tasks. The conclusion is supported by extensive experiments on 8 Word evaluation benchmarks on Wikipedia training corpus. The thesis also explores the impact of hyper-parameters on the result, including window size and xmax in GloVe. Another interesting observation is that Glove-L does not work well for word analogy tasks.

Recommended Citation

Lu, Quinlan, "Improved GloVe Word Embedding Using Linear Weighting Scheme for Word Similarity Tasks" (2021). Electronic Theses and Dissertations. 8845.
https://scholar.uwindsor.ca/etd/8845

Download

Included in

Computer Sciences Commons

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

Improved GloVe Word Embedding Using Linear Weighting Scheme for Word Similarity Tasks

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Improved GloVe Word Embedding Using Linear Weighting Scheme for Word Similarity Tasks

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner