Electronic Theses and Dissertations

Osprey: Early Detection of Online Grooming via Conversational Features and Backtranslation Data Augmentation

Hamed Waezi, University of WindsorFollow

Date of Award

9-20-2024

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Conversation Classification;Machine Learning;Natural Language Processing;Online Grooming Detection

Supervisor

Hossein Fani

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Abstract

Grooming minors for sexual exploitation has become an increasingly significant concern on online conversation platforms. For a safer online experience for minors, researchers have proposed machine learning models that analyze explicit textual content to automate the detection of predatory conversations, enabling warnings to minors, parents, or law enforcement while preserving minors’ privacy. However, the proposed models often fall short of real-world applications due to the short, noisy, and informal nature of messages and, more importantly, the sparse distribution of predatory conversations. In this research, we introduce Osprey, an open-source benchmark facilitating standardized pipelines for online grooming detection. Osprey implements canonical neural models, vector representation learning, and novel features, including one-on-one interactions, message exchange patterns, and temporal signals. We extend Osprey to support backtranslation augmentation, a round-trip translation of original conversations via intermediary natural languages, aimed at augmenting training datasets with additional predatory conversations. The modular design of Osprey allows seamless incorporation of further features to address evolving research needs. In addition to this framework, we propose the use of recurrent models, where the input to the model is a sequence of feature vectors, each representing a message. This approach allows the model to incorporate not only the text embedding of a message but also other relevant information, such as the timestamp of the message, the number of participants, and the identity of the message sender. This formulation is particularly useful for the early detection of grooming conversations, as it enables the model to process messages incrementally. In this research, we evaluate the efficiency and effectiveness of our models using various metrics and present the results through comprehensive data visualization techniques.

Recommended Citation

Waezi, Hamed, "Osprey: Early Detection of Online Grooming via Conversational Features and Backtranslation Data Augmentation" (2024). Electronic Theses and Dissertations. 9392.
https://scholar.uwindsor.ca/etd/9392

Scholarship at UWindsor

Electronic Theses and Dissertations

Osprey: Early Detection of Online Grooming via Conversational Features and Backtranslation Data Augmentation

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Creative Commons License

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Scholarship at UWindsor

Electronic Theses and Dissertations

Osprey: Early Detection of Online Grooming via Conversational Features and Backtranslation Data Augmentation

Author

Date of Award

Publication Type

Degree Name

Department

Keywords

Supervisor

Creative Commons License

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner