Design Space Exploration of the Physical Design of 12 nm Low Power AI Processor

Date of Award


Publication Type


Degree Name



Electrical and Computer Engineering

First Advisor


Second Advisor


Third Advisor



Convolutional Neural Network, Design space exploration, Hardware acceleration, Low Power, Macro placement, Physical Design



Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Commercially available physical design Computer-Aided Design (CAD) tools for automatic placement of macros often give unoptimized macro placements with large area, power and wirelength that does not meet timing. These macro placements do not follow the Register Transfer Language (RTL) data flow and have a large amount of unoccupied space which leads to these inefficiencies. Systematic analysis and pruning of these placements as part of the Design Space Exploration (DSE) help remove these inefficiencies and get improved metrics like area, utilization, wirelength, timing, and power. This DSE helps choose the trade-off between area, speed, and power consumption for the physical placement of memory macros present in the Convolutional Neural Network (CNN) subprocessor/Programmable Functional Array (PFA) unit of the 12nm low-power AI processor provided by the industry partner. The goal of this research is to obtain an optimized memory macro placement for this AI processor, provided by the industry partner, that results in the most optimized metrics. The optimized metrics include meeting timing with the lowest area, wirelength, and power as well as maximum utilization. A relative memory macro placement methodology is developed to ease the process of DSE. Relative memory macro placement is based on the idea that a rectangular object can be placed in 16 different locations around another rectangular object based on its length and width. This methodology enables faster memory macro placement enabling a quicker design space exploration. The DSE ultimately led to 25% area improvements, a 40% points increase in utilization, 50% decrease in the wirelength resource requirements and, 29.22% decrease in total power requirements. This shows that DSE leads to optimized macro placement (in this case, memory macros) which provides much better results as compared to macro placement generated by a well known commercial CAD tool.

Furthermore, the methodology reduced the hand-placement time for memory macros from approximately 2 weeks for part of the design to 3-4 days for the complete design. The AI processor is synthesized in a 12nm technology using a low-power standard cell library and occupies a die area of around 4.5 mm x 3.5 mm.