Log in

Imperial users details

Other users details

No account? details

Information on

Finding a talk details

Adding a talk details

Syndicating talks details

Who we are details

Everything else details

ATHEENA Gets Buff(er)ed: Modelling and Constructing Early-Exit Network FPGA Accelerators

Add to your list(s) Download to your calendar using vCal

Benjamin Biggs (Imperial College)
Monday 06 November 2023, 13:00-14:00
Department of Electrical and Electronic Engineering, Room 909B.

If you have a question about this talk, please contact George A Constantinides.

The continued need for improvements in accuracy, throughput, and efficiency of Deep Neural Networks has resulted in a multitude of methods that make the most of custom architectures on FPG As. These include the creation of hand-crafted networks and the use of quantization and pruning to reduce extraneous network parameters. However, with the potential of static solutions already well exploited, we propose to shift the focus to using the varying difficulty of individual data samples to further improve efficiency and reduce average compute for classification. Input-dependent computation allows for the network to make runtime decisions to finish a task early if the result meets a confidence threshold. Early-Exit network architectures have become an increasingly popular way to implement such behaviour in software.

We create \textit{A Toolflow for Hardware Early-Exit Network Automation} (ATHEENA), an automated FPGA toolflow that leverages the probability of samples exiting early from such networks to scale the resources allocated to different sections of the network. We employ probabilistic modelling methods derived from queueing theory in order to accurately determine throughput under finite buffer constraints dictated by limited on-chip Block-RAM resources. These analytical methods have been verified against custom event-driven simulations of an abstract accelerator method and are, qualitatively, much faster to run, and quantitatively, within 2 percent of the simulated throughput. The toolflow uses the data-flow model of fpgaConvNet, extended to support the control-flow of Early-Exit networks as well as Design Space Exploration to optimize the generated streaming architecture hardware with the goal of increasing throughput/reducing area while maintaining accuracy.

The original ATHEENA results were calculated using a naïve, infinite buffer assumption, these experimental results on three different networks demonstrate a throughput increase of $2.00\times$ to $2.78\times$ compared to an optimized baseline network implementation with no early exits. Additionally, the toolflow can achieve a throughput matching the same baseline with as low as $46\%$ of the resources the baseline requires. We plan to explore buffer placement and continue expanding the benchmarks to obtain improved results.

This talk is part of the CAS Talks series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

ATHEENA Gets Buff(er)ed: Modelling and Constructing Early-Exit Network FPGA Accelerators

This talk is included in these lists:

Other lists

Other talks