Date of Award
Electrical and Computer Engineering
Convolutional Neural Network, Deep Learning, FPGA, OpenCL, Optimization
In recent years, deep convolutional neural networks (ConvNet) have shown their popularity in various real world applications. To provide more accurate results, the state-of-the-art ConvNet requires millions of parameters and billions of operations to process a single image, which represents a computational challenge for general purpose processors. As a result, hardware accelerators such as Graphic Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), have been adopted to improve the performance of ConvNet. However, GPU-based solution consumes a considerable amount of power and a traditional RTL design on FPGA requires tedious development that is very time-consuming. In this work, we propose a scalable and parameterized end-to-end ConvNet design using Intel FPGA SDK for OpenCL. To validate the design, we implement VGG 16 model on two different FPGA boards. Consequently, our designs achieve 306.41 GOPS on Intel Stratix A7 and 318.94 GOPS on Intel Arria 10 GX 10AX115. To the best of our knowledge, this outperforms previous FPGA-based accelerators. Compared to the CPU (Intel Xeon E5-2620) and a mid-range GPU (Nvidia K40), our design is 24.3X and 1.7X more energy efficient respectively.
Li, Huyuan, "Acceleration of Deep Learning on FPGA" (2017). Electronic Theses and Dissertations. 5947.