Date of Award
3-18-2025
Publication Type
Doctoral Thesis
Degree Name
Ph.D.
Department
Computer Science
Keywords
Lottery Ticket Pruning, NLP, Pruning, Sentiment, Tabular, Transformer
Supervisor
Robin Gras
Rights
info:eu-repo/semantics/openAccess
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Abstract
In this dissertation, we explore two topics: pruning transformer models and sentiment classification. Our primary goal is to improve neural networks in terms of accuracy and inference speed for sentiment classification. We use the state-of-the-art transformer architecture and develop a pruning approach based on a genetic algorithm to shrink a pre-trained model (e.g. BERT) specifically for sentiment classification. Our first two chapters explore how we can prune a smaller architecture, like the tabular neural network (TNN), in an effort to test our pruning methods without the cost of using the significantly larger pre-trained transformers. Chapter 3 first establishes effectiveness of standard pruning approaches based on the lottery ticket hypothesis: iterative and direct (one-shot) pruning. We find that iterative pruning is consistently better than direct pruning for tabular neural networks (for the datasets we have tested) and improves over the original network in a majority of datasets tested. However, the approach lacks improvement in two datasets and it became clear the approaches were limited in how we selected weights. Chapter 4 adapts our methodology to resolve key issues when trying to apply our pruning strategy to the transformer architecture. We continue to test with the TNN to adapt our method with the following improvements: flexibility in lottery weight selection, removal of initial full-sized model training, and reduction in training dataset used to find lottery weights. To prune a transformer, especially if this approach could be applied to large language models, we need to remove the full-model training that impedes scalability. In addition, we need to reduce the training data in such a way that we can rapidly test many pruning variants (we named it lottery sample selection). Finally, this level of efficiency allows us to use search algorithms, in particular the genetic algorithm, to explore a wide range of pruning options. We find that we can further improve on our prior TNN pruning results despite never training the original network (to measure weights) and using a fraction of the training data to perform the search (some datasets down to 5%). While it is not the most accurate variant found, a particularly noteworthy achievement using this approach was the ability to prune the TNN to 1 neuron per linear layer (maximum) using 13 training samples (5%) with a marginal loss of accuracy (-1.76%). Chapter 5 begins research into developing and improving sentiment classification for transformers. We work with the pre-trained model BERT and a large 1.6 million sample sentiment dataset. The approach we use takes advantage of the next-sentence prediction training strategy to create a 0-shot transformer model. We used this model to predict emotion labels for all samples in the dataset with the intention to incorporate this information as an auxiliary input to the sentiment model. We tried two approaches, one using the labels as a direct input to the linear layers of the model, and the second as a textual input (using custom embeddings) with the sample text. The former showed signs of slight overfitting while the latter improved accuracy. Chapter 6 puts all our efforts into making a lightweight, yet strong sentiment transformer. Due to our advancements in finding lottery weights without training the original network and our advancements in finding lottery weights using a fraction of the training dataset through our lottery sample selection, we were able to apply these strategies to the transformer model (particularly a pre-trained transformer model). Using 3.5% of the dataset to search for lottery weights, we achieved accuracy improvements over the original network up to 50% pruned. We achieved a stable accuracy up to 70% pruned. We achieved up to 90% pruned with <1% drop in accuracy and 1.5% drop in accuracy at 95% pruned. The approach also incorporates our 0-shot auxiliary labels and data augmentation to further improve accuracy. Finally, we show inference improvements when pruned to highlight the importance of these achievements. As for future work, our designs are largely adaptable for any task. Future work aims to create strong, lightweight transformer models for any task, particularly tasks lacking available datasets. Other applications include pruning large language models significantly into expert-level domain-specific models.
Recommended Citation
Bluteau, Ryan, "Genetic-Based Lottery Ticket Pruning for Transformers in Sentiment Classification: Realized Through Lottery Sample Selection" (2025). Electronic Theses and Dissertations. 9698.
https://scholar.uwindsor.ca/etd/9698