Date of Award

2016

Publication Type

Master Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

academic papers, feature selection, feature weight normalization, language models, text classification

Supervisor

Lu, Jianguo

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

The fast growing speed of the size of scholarly data have made it necessary to nd out e cient machine learning ways to automatically categorize the data. This thesis aims to build a classi er that can automatically categorize Computer Science (CS) papers based on text content. To nd out the best method for CS papers, we collect and prepare two large labeled data sets: CiteSeerX and arXiv, and experiment with di erent classi cation approaches including Naive Bayes and Logistic Regression, di erent feature selection schemes, di erent language models, and di erent feature weighting schemes. We found that with large size of training set, Bi-gram modeling with normalized feature weight performs the best for all the two data sets. It is surprising that arXiv data set can be classi ed up to 0.95 F1 value, while CiteSeerX reaches lower F1 (0.764). That is probably caused by labeling of CiteSeerX is not as accurate as arXiv data set.

Recommended Citation

Zhou, Tong, "Automated Identification of Computer Science Research Papers" (2016). Electronic Theses and Dissertations. 5776.
https://scholar.uwindsor.ca/etd/5776

Download

COinS

Scholarship at UWindsor

Electronic Theses and Dissertations

Automated Identification of Computer Science Research Papers

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Automated Identification of Computer Science Research Papers

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Rights

Creative Commons License

Abstract

Recommended Citation

Share

Search

Browse

Author Corner