Date of Award
6-19-2024
Publication Type
Thesis
Degree Name
M.Sc.
Department
Computer Science
Keywords
Artificial Intelligence;Large Language Models (LLMs);Natural Language Processing;Optimization;Transformer Models
Supervisor
Robin Gras
Abstract
The continuous evolution of natural language processing (NLP) has been pivotal in advancing AI's capacity to comprehend and decode human language. Large Language Models (LLMs) such as BERT, GPT, and RoBERTa exemplify this progress, setting new benchmarks across a spectrum of NLP tasks, including sentiment analysis. However, the practical deployment of such models encounters significant computational obstacles owing to their intricate architectures, demanding substantial processing resources and memory. Moreover, concerns about the environmental repercussions of training large-scale NLP models have gained prominence, accentuating the imperative for sustainable AI development to mitigate carbon emissions associated with training processes. Recent advancements in model optimization have explored techniques like weight pruning and identifying subnetworks within models to alleviate computational demands. Weight pruning selectively eliminates redundant model parameters, consequently reducing power consumption and enhancing suitability for resource-limited settings. Research has also delved into identifying subnetworks within models, which demonstrate the potential for achieving comparable or superior performance with fewer parameters. Motivated by these challenges and insights, this thesis addresses the computational and practical impediments associated with deploying large-scale NLP models, focusing on the optimization of RoBERTa and Electra. The research endeavors to optimize these models through novel model pruning techniques and strategic fine-tuning, guided by the notion of a core sub-model embedded within trained LLMs. This core sub-model encapsulates generic language properties that are prevalent across various NLP tasks, serving as a foundational framework for fine-tuning on new tasks. Leveraging this core sub-model enables a streamlined fine- tuning process that targets only a subset of parameters, thereby enhancing efficiency while preserving or improving NLP task performance.
Recommended Citation
Muthineni, Prithvi Rao, "Optimizing LLMs: Harnessing Core Sub-Models in Transformers for Efficient Training on New Tasks" (2024). Electronic Theses and Dissertations. 9499.
https://scholar.uwindsor.ca/etd/9499