Date of Award
2-5-2025
Publication Type
Thesis
Degree Name
M.Sc.
Department
Computer Science
Supervisor
Imran Ahmad
Supervisor
Muhammad Asaduzzaman
Rights
info:eu-repo/semantics/embargoedAccess
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
Efficient issue management is crucial for the success of open-source software (OSS) projects, particularly in resolving unresolved issues and fostering community contri- butions. This thesis explores two significant aspects of GitHub issue reports—“help wanted” (HW) and “wontfix”—to provide actionable insights into issue classification and management. The first study focuses on HW issues (HWIs), a key mechanism for encouraging community participation in OSS projects. Analyzing 36,212 HWIs from 100 popular repositories, we conducted a detailed manual analysis of 414 ran- domly selected reports, uncovering 23 distinct reasons for assigning the help wanted label, grouped into three categories: suitability for first-time contributors, lack of time or resources, and low priority. Promptly assigning the help wanted label and engaging expert developers in discussions significantly improve issue resolution rates. To support automated classification, we developed a machine learning model us- ing XGBoost, achieving an accuracy of 0.71, an AUC of 0.87, and a recall of 0.87. The second study investigates “wontfix” issues, often used to indicate that no fur- ther action will be taken on a report. Using a dataset of 300 repositories across three popular languages—Java, JavaScript, and Python—we manually analyzed 420 randomly selected reports and identified 45 reasons for marking issues as “wontfix” including 25 newly discovered ones. Quantitative analysis revealed that reporters with limited prior contributions were more likely to submit “wontfix” issues. Using machine learning algorithms, we developed models with 43 features—including re- porter experience, text attributes, collaboration networks, readability, completeness, and developer activity—to classify “wontfix” and “non-wontfix” issues. Our models achieved significant improvements, classifying up to three times more true positives while maintaining a comparable false positive rate to existing studies. These studies provide insights into GitHub issue management and novel contributions to automat- ing issue classification, while the models are extensible to other GitHub issue types, offering practical tools for optimizing resource allocation and enhancing community engagement in OSS projects.
Recommended Citation
Ali, Rumman, "Supporting Management of Issue Reports" (2025). Electronic Theses and Dissertations. 9638.
https://scholar.uwindsor.ca/etd/9638