Date of Award


Publication Type

Master Thesis

Degree Name



Computer Science

First Advisor

Christie I. Ezeife


Applied sciences, Clustering, Data mining, Email management, Email mining, Email overload, Folder summarization



Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.


Email inboxes are now filled with huge varieties of voluminous messages and thus increasing the problem of "email overload" which places financial burden on companies and individuals. Email mining provides solution to email overload problem by automatically grouping emails into meaningful and similar groups based on email subjects and contents. Existing email mining systems such as Kernel-Selected clustering and BuzzTrack, do not consider the semantic similarity between email contents, also when large number of email messages are clustered to a single folder they retain the problem of email overload. This thesis proposes a system named AEMS for automatic folder and sub-folder creation, indexing of the created folders with link to each folder and sub-folder, also an Apriori-based folder summarization containing important keywords from the folder. Thesis aims at solving email overload problem through semantic re-structuring of emails. In AEMS model, a novel approach named Semantic Non-parametric K-Means++ clustering is proposed for folder creation, which avoids, (1) random seed selection by selecting the seed according to email weights, and (2) pre-defined number of clusters using the similarity between the email contents. Experiments show the effectiveness and efficiency of the proposed techniques using large volumes of email datasets. Keywords: Email Mining, Email Overload, Email Management, Data Mining, Clustering, Feature Selection, Folder Summarization.