Metadata Extraction and Citation Automation (MECA)

Submitter and Co-author information

Sharjeel Mustafa, Odette School of Business

Standing

Undergraduate

Type of Proposal

Oral Research Presentation

Challenges Theme

Open Challenge

Faculty Sponsor

Dr. Kyle Brykman

Proposal

Zotero is a tool that researchers typically use to manage their articles and import citations. An issue arises when migrating a local database to Zotero. If a bulk import is used, Zotero neither validates nor formats the metadata, which leads to inconsistency (e.g., inconsistent title capitalization and field entries); Otherwise, the data has to be manually reviewed and entered, which can be particularly time-consuming. To address these issues MECA (Metadata Extraction and Citation Automation) is being developed to automate the process and enforce consistency. The program accomplishes this by using optical character recognition or a PDF reader to extract a DOI, which is then validated using a CrossRef API. Upon validation, a predefined set of fields is requested and formatted by identifying nouns through parts-of-speech tagging. The data is then packaged with the corresponding file and inputted into Zotero through an API. Articles that fail any step are skipped and copied into a folder for manual review. The advantage of such automation is that it can enforce consistency, increase efficiency, and operate in the background. The user will only need to review a small subset of the original files that were too ambiguous for the program. The program performs best with newer articles in text-PDF and has a numerical DOI. We are excited by the potential for this project to save scholars in Canada and abroad substantial time citing their research, allowing them to devote their efforts to the pursuit of important scholarly objectives outlined in the SDG.

Grand Challenges

Viable, Healthy and Safe Communities

Share

COinS
 

Metadata Extraction and Citation Automation (MECA)

Zotero is a tool that researchers typically use to manage their articles and import citations. An issue arises when migrating a local database to Zotero. If a bulk import is used, Zotero neither validates nor formats the metadata, which leads to inconsistency (e.g., inconsistent title capitalization and field entries); Otherwise, the data has to be manually reviewed and entered, which can be particularly time-consuming. To address these issues MECA (Metadata Extraction and Citation Automation) is being developed to automate the process and enforce consistency. The program accomplishes this by using optical character recognition or a PDF reader to extract a DOI, which is then validated using a CrossRef API. Upon validation, a predefined set of fields is requested and formatted by identifying nouns through parts-of-speech tagging. The data is then packaged with the corresponding file and inputted into Zotero through an API. Articles that fail any step are skipped and copied into a folder for manual review. The advantage of such automation is that it can enforce consistency, increase efficiency, and operate in the background. The user will only need to review a small subset of the original files that were too ambiguous for the program. The program performs best with newer articles in text-PDF and has a numerical DOI. We are excited by the potential for this project to save scholars in Canada and abroad substantial time citing their research, allowing them to devote their efforts to the pursuit of important scholarly objectives outlined in the SDG.