Date of Award

9-6-2019

Publication Type

Doctoral Thesis

Degree Name

Ph.D.

Department

Computer Science

Keywords

Academic papers, Data mining, Embeddings, Networks, Skip-gram

Supervisor

Lu, J.

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

Academic papers contain both text and citation links. Representing such data is crucial for many downstream tasks, such as classification, disambiguation, duplicates detection, recommendation and influence prediction. The success of Skip-gram with Negative Sampling model (hereafter SGNS) has inspired many algorithms to learn embeddings for words, documents, and networks. However, there is limited research on learning the representation of linked documents such as academic papers. This dissertation first studies the norm convergence issue in SGNS and propose to use an L2 regularization to fix the problem. Our experiments show that our method improves SGNS and its variants on different types of data. We observe improvements upto 17.47% for word embeddings, 1.85% for document embeddings, and 46.41% for network embeddings. To learn the embeddings for academic papers, we propose several neural network based algorithms that can learn high-quality embeddings from different types of data. The algorithms we proposed are N2V (network2vector) for networks, D2V (document2vector) for documents, and P2V (paper2vector) for academic papers. Experiments show that our models outperform traditional algorithms and the state-of-the-art neural network methods on various datasets under different machine learning tasks. With the high quality embeddings, we design and present four applications on real-world datasets, i.e., academic paper and author search engines, author name disambiguation, and paper influence prediction.

Recommended Citation

Zhang, Yi, "Learning Embeddings for Academic Papers" (2019). Electronic Theses and Dissertations. 7855.
https://scholar.uwindsor.ca/etd/7855

Download

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

Learning Embeddings for Academic Papers

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Learning Embeddings for Academic Papers

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Share

Search

Browse

Author Corner