FPGA-Based Hardware Acceleration of Canny Image Edge Detector Using SYCL

Date of Award


Publication Type


Degree Name



Electrical and Computer Engineering


Hardware acceleration, Canny image edge detector, FPGA-based







Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Detecting edges is one of the most fundamental algorithms in image processing, in many fields of science ranging from space exploration imaging and radar applications, to medical imaging, computer vision, security systems, and Television broadcasting, and since it has a great impact on the result of the image processing process, considerable effort, time, and resources are dedicated to development of edge detection algorithms and fast and efficient implementation of them for different applications. Due to less false edges compared to other edge detection algorithms, and being less prone to noise, proper localization and clear thin edges with extreme accuracy, Canny algorithm has been a very popular technique for image edge detection in different applications. However, being computationally intensive and implementation complexity of Canny Edge Detection (CED) restricts its application in many fast and real time applications.

By employing inherently parallel processors like GPU and FPGA, heterogeneous architecture could be very effective solution to accelerate this process. And with the advent of high-level abstract parallel programming languages like SYCL in recent years that eased the development complexity, FPGA particularly has become a very efficient accelerator with great parallelism capabilities and higher power efficiency.

In this project the canny edge detector is implemented using Data Parallel C++ (DPC++) language, which is the Intel development of SYCL standard, and after being optimized by applying SYCL-based acceleration techniques, the design will be tested on two state-of-the-art FPGAs of Intel, Arria 10 and Stratix 10, as well as CPU (Intel Xeon Gold-2168) and GPU (Intel UHD P630) devices.

Arria 10 and Stratix 10 FPGAs respectively offer 9x-11.6x and 9.6x-11.7x acceleration compared to CPU implementation, and both FPGAs execute the algorithm more than 4x times faster than GPU, for different input images with 128x128 to 1024x1024 pixels. In terms of energy consumption, Arria 10 was the most energy efficient device to execute CED algorithm, with 60X less energy consumption compared to CPU. Compared to GPU implementation, Arria 10 consumes 10.2X-31.6X less energy in different image sizes. Arria 10 is also 2X time more energy efficient compared to Stratix 10 FPGA.