Date of Award
2016
Publication Type
Master Thesis
Degree Name
M.Sc.
Department
Computer Science
Keywords
bad words, feature selection, machine learning, spammer, Twitter
Supervisor
Lu, Jianguo
Rights
info:eu-repo/semantics/openAccess
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
Large amount of Twitter accounts are suspended. Over ve year period, about 14% accounts are terminated for reasons not speci ed explicitly by the service provider. We collected about 120,000 suspended users, along with their tweets and social re- lations. This thesis studies these suspended users, and compares them with normal users in terms of their tweets. We train classi ers to automatically predict whether a user will be suspended. Three di erent kinds of features are used. We experimented using Nave Bayes method, including Bernoulli (BNB) and multinomial (MNB) plus various feature selection mechanisms (mutual information, chi square and point-wise mutual informa- tion) and achieved F1=78%. To reduce the high dimensions, in our second approach we use word2vec and doc2vec to represent each user with a vector of a shot and xed length and achieved F1 (73%) using SVM with RBF function kernel. Random forest works best with F1=74% on this approach.
Recommended Citation
Cui, Xiutian, "Identifying Suspended Accounts In Twitter" (2016). Electronic Theses and Dissertations. 5725.
https://scholar.uwindsor.ca/etd/5725