Date of Award

10-30-2020

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Author Name Disambiguation, Co-training, doc2vec, multi-view learning

Supervisor

Jianguo Lu

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

In the community of bibliometrics, author name ambiguity means that author's name is not a reliable identier for associating academic papers with their authors. Author name ambiguity has been the problem in bibliometrics and service providers like Google Scholar, generating a domain of study call Author Name Disambiguation (AND). Author name ambiguity is often tackled using classication techniques, where labeled papers are provided, and papers are assigned to correct authors according to the paper text and paper citations. When applying classication methods to author name disambiguation, two issues stand out: one is that a paper has multiple views (paper text and citation network). The other is the lack of training data: there are not many papers that are labeled. To cope with these two issues, we propose to use the co-training algorithm in AND. The co-training algorithm uses two views to classify papers iteratively and add the top selected papers into the training pool. We demonstrate that the co-training algorithm outperforms the baseline multi-view classication algorithm. We also experiment with hyper-parameters in the co-training algorithm. The experiment is done on the PubMed dataset, where authors are labeled with ORCID. Papers are represented by two embeddings that are learnt from paper content and paper citation network separately. Baseline classiers for comparison are logistic regression and SVM.

Recommended Citation

Gao, Yan, "Author Name Disambiguation Using Co-training" (2020). Electronic Theses and Dissertations. 8447.
https://scholar.uwindsor.ca/etd/8447

Download

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

Author Name Disambiguation Using Co-training

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Author Name Disambiguation Using Co-training

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Share

Search

Browse

Author Corner