An Empirical Analysis of AutoMl Tools and Techniques with Automated Feature Engineering

Date of Award

9-28-2022

Publication Type

Thesis

Degree Name

M.Sc.

Department

Computer Science

Keywords

Automated machine learning, Genetic algorithm, Optimization

Supervisor

S.Saad

Supervisor

D.Wu

Rights

info:eu-repo/semantics/embargoedAccess

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Automated machine learning is an approach to automate the creation of machine learning pipelines and models. The ability to automatically create a machine learning pipeline would allow users without machine learning knowledge to create and use machine learning systems. Existing machine learning practitioners can also use these automated approaches to simplify the creation of machine learning systems. As with any tool, effective evaluations of AutoML tools are necessary to ensure users can select the correct tool for their machine learning task.

Current evaluations of automated machine learning are performed on simple general purpose datasets, and these datasets may be unable to provide necessary comparison information depending on the machine learning task. There is also limited work on whether AutoML systems can generate comparable models to domain experts on domain-specific data. With many current AutoML approaches, only a small part of the machine learning pipeline is automated. For AutoML to replace the need for machine learning knowledge for its users, complete automation of the machine learning pipeline will be necessary. Automating the feature engineering process is the next step of automation for the many current AutoML approaches.

In this thesis, we present an empirical analysis of current open-source AutoML tools for tasks within the cybersecurity domain, highlight the current weakness of AutoML tools and evaluate the performance of popular AutoML tools for cybersecurity datasets. In addition, we propose a method of augmenting existing AutoML tools with automated feature engineering and assess the impact of different generation approaches and the effect on total pipeline creation time.

Share

COinS