An Empirical Analysis of AutoMl Tools and Techniques with Automated Feature Engineering

Date of Award


Publication Type


Degree Name



Computer Science


Automated machine learning, Genetic algorithm, Optimization







Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Automated machine learning is an approach to automate the creation of machine learning pipelines and models. The ability to automatically create a machine learning pipeline would allow users without machine learning knowledge to create and use machine learning systems. Existing machine learning practitioners can also use these automated approaches to simplify the creation of machine learning systems. As with any tool, effective evaluations of AutoML tools are necessary to ensure users can select the correct tool for their machine learning task.

Current evaluations of automated machine learning are performed on simple general purpose datasets, and these datasets may be unable to provide necessary comparison information depending on the machine learning task. There is also limited work on whether AutoML systems can generate comparable models to domain experts on domain-specific data. With many current AutoML approaches, only a small part of the machine learning pipeline is automated. For AutoML to replace the need for machine learning knowledge for its users, complete automation of the machine learning pipeline will be necessary. Automating the feature engineering process is the next step of automation for the many current AutoML approaches.

In this thesis, we present an empirical analysis of current open-source AutoML tools for tasks within the cybersecurity domain, highlight the current weakness of AutoML tools and evaluate the performance of popular AutoML tools for cybersecurity datasets. In addition, we propose a method of augmenting existing AutoML tools with automated feature engineering and assess the impact of different generation approaches and the effect on total pipeline creation time.