Date of Award

2023

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Classification, Data imbalance, Deep learning, Machine learning, Natural language processing

Supervisor

H.Fani

Supervisor

A.Ngom

Rights

info:eu-repo/semantics/openAccess

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Online sexual abuse is a concerning yet severely overlooked vice of modern society. With more children being on the Internet and with the ever-increasing advent of web-applications such as online chatrooms and multiplayer games, preying on vulnerable users has become more accessible for predators. In recent years, there has been work on detecting online sexual predators using Machine Learning and deep learning techniques. Such work has trained on severely imbalanced datasets, and imbalance is handled via manual trimming of over-represented labels. In this work, we propose an approach that first tackles the problem of imbalance and then improves the effectiveness of the underlying classifiers. Our evaluation of the proposed sampling approach on PAN benchmark dataset shows performance improvements on several classification metrics, compared to prior methods that otherwise require hands-crafted sampling of the data.

Share

COinS