Date of Award

6-1-2023

Publication Type

Thesis

Degree Name

M.A.Sc.

Department

Electrical and Computer Engineering

Keywords

CNN;CPU;DL;GPU;ML;vgg16

Supervisor

Mohammed Khalid

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Deep learning (DL) has proven to be a significant solution for analyzing complex datasets such as images, videos, text, and speech. Convolutional neural networks (CNN) have proven to be one of the most popular and powerful deep neural networks to perform image classification. However, due to its high computational complexity, high speed and accuracy required in many real-world applications, CNN implementation presents a computational challenge for computing devices. The recent advances in hardware have led to the emergence of the graphical processing unit (GPU) as a solution for speeding up the process of executing complex deep learning algorithms. Although a central processing unit (CPU) is designed to handle a wide range of tasks quickly, it is limited in the concurrency of tasks that it can execute in parallel. This research presents a comparative analysis of CPU and GPU for image classification using the pre-trained Caffe vgg16 CNN model optimized by Intel OpenVINO's model optimizer feature. OpenVINO is an open-source toolkit for optimizing and deploying DL inference. It also boosts deep learning performance in computer vision, speech recognition, and other common tasks. Performance characteristics of the optimized model for image classification were studied by running it on the Intel Core i5-1035G1 CPU and Intel UHD Graphics G1 GPU. Moreover, the accuracy was tested by running the optimized models on the first 50,000 images of the ImageNet 2012 validation dataset. The research indicates that GPU implementation is on average 1.5x times faster than the CPU implementation for the single precision optimized model and on average 2x times faster than the CPU implementation for the half-precision optimized model. On CPU, the single precision optimized model achieves 70.96% top-1 accuracy and 89.88% top-5 accuracy, and the half-precision optimized model achieves 64.64% top-1 accuracy and 86.21% top-5 accuracy. On GPU, the difference between the single precision optimized model and the half-precision optimized model on top-1 and top-5 accuracies are almost the same as CPU. The research also shows that there exists a significant latency-throughput trade-off.

Share

COinS