Date of Award
Classification, Data imbalance, Deep learning, Machine learning, Natural language processing
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Online sexual abuse is a concerning yet severely overlooked vice of modern society. With more children being on the Internet and with the ever-increasing advent of web-applications such as online chatrooms and multiplayer games, preying on vulnerable users has become more accessible for predators. In recent years, there has been work on detecting online sexual predators using Machine Learning and deep learning techniques. Such work has trained on severely imbalanced datasets, and imbalance is handled via manual trimming of over-represented labels. In this work, we propose an approach that first tackles the problem of imbalance and then improves the effectiveness of the underlying classifiers. Our evaluation of the proposed sampling approach on PAN benchmark dataset shows performance improvements on several classification metrics, compared to prior methods that otherwise require hands-crafted sampling of the data.
Khalid, Muhammad, "Online Sexual Predator Detection" (2023). Electronic Theses and Dissertations. 8956.