Date of Award

2019

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

data mining, document embedding, multi-view learning, natural language processing

Supervisor

Jianguo Lu

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

Data-driven learning of document vectors that capture linkage between them is of immense importance in natural language processing (NLP). These document vectors can, in turn, be used for tasks like information retrieval, document classification, and clustering. Inherently, documents are linked together in the form of links or citations in case of web pages or academic papers respectively. Methods like PV-DM or PV-DBOW try to capture the semantic representation of the document using only the text information. These methods ignore the network information altogether while learning the representation. Similarly, methods developed for network representation learning like node2vec or DeepWalk, capture the linkage information between the documents but they ignore the text information altogether. In this thesis, we proposed a method based on Retrofit for learning word embeddings using a semantic lexicon, which tries to incorporate both the text and network information together while learning the document representation. We also analyze the optimum weight for adding network information that will give us the best embedding. Our experimentation result shows that our method improves the classification score by 4% and we also introduce a new dataset containing both network and content information.

Recommended Citation

Mansoor, Zeeshan, "Improving Document Representation Using Retrofitting" (2019). Electronic Theses and Dissertations. 7721.
https://scholar.uwindsor.ca/etd/7721

Download

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

Improving Document Representation Using Retrofitting

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Improving Document Representation Using Retrofitting

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Share

Search

Browse

Author Corner