Imperial College London > Talks@ee.imperial > CAS Talks > From Sparsity to HBM: HPIPE's Advancements in FPGA CNN Acceleration

From Sparsity to HBM: HPIPE's Advancements in FPGA CNN Acceleration

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact George A Constantinides.

HPIPE is a state-of-the-art sparse-aware CNN accelerator meticulously designed for FPG As. Diverging from the conventional approach where generic processing elements collectively handle one layer at a time, HPIPE ’s compiler statically allocates device resources, crafting custom hardware for each layer within the CNN . Furthermore, the HPIPE compiler capitalizes on sparsity, enabling the accelerator to circumvent unnecessary multiplications with weights that closely approximate zero. Recent advancements have extended HPIPE to the AI-optimized Stratix 10 NX, harnessing its innovative tensor block architecture featuring 30 INT8 Multipliers per tensor block, achieving even higher performance. However, HPIPE requires all weights to reside on-chip, thereby introducing memory constraints when dealing with larger networks such as Resnets. To circumvent this limitation, HPIPE has been augmented to partition CNNs across multiple FPG As that communicate via Ethernet, thereby multiplying the amount of available On-Chip memory and DSPs. Lastly, the presentation will delve into the latest efforts aimed at integrating High Bandwidth Memory (HBM) support into HPIPE . This enhancement will not only alleviate memory-related restrictions but also effectively decouple memory and computing power. As a result, HPIPE can be flexibly deployed on smaller FPG As while concurrently accommodating larger networks.

Online: https://utoronto.zoom.us/j/82613229697

Speaker’s Bio: Mario Doumet is an MASc. student in Computer Engineering at the University of Toronto under the supervision of Prof. Vaughn Betz. His research focuses on AI acceleration and dataflow architectures. Over the course of his Master’s, Mario completed an internship at Intel Labs’ Parallel Computing Lab (PCL) working on distributed ML training using FPG As as smart NICs. Prior to starting his Master’s, he completed his B.Eng. at the American University of Beirut, along with a year of research in collaboration with the MIT Media Lab where he helped in the development of the world’s first battery-free wireless underwater camera.

This talk is part of the CAS Talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Changes to Talks@imperial | Privacy and Publicity