Date of Award
bad words, feature selection, machine learning, spammer, Twitter
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Large amount of Twitter accounts are suspended. Over ve year period, about 14% accounts are terminated for reasons not speci ed explicitly by the service provider. We collected about 120,000 suspended users, along with their tweets and social re- lations. This thesis studies these suspended users, and compares them with normal users in terms of their tweets. We train classi ers to automatically predict whether a user will be suspended. Three di erent kinds of features are used. We experimented using Nave Bayes method, including Bernoulli (BNB) and multinomial (MNB) plus various feature selection mechanisms (mutual information, chi square and point-wise mutual informa- tion) and achieved F1=78%. To reduce the high dimensions, in our second approach we use word2vec and doc2vec to represent each user with a vector of a shot and xed length and achieved F1 (73%) using SVM with RBF function kernel. Random forest works best with F1=74% on this approach.
Cui, Xiutian, "Identifying Suspended Accounts In Twitter" (2016). Electronic Theses and Dissertations. 5725.