Imperial College London > Talks@ee.imperial > CAS Talks > Efficient Computing In-memory Architectures for FPGA-based Deep Learning Acceleration

Efficient Computing In-memory Architectures for FPGA-based Deep Learning Acceleration

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact George A Constantinides.

Modern FPGA -based deep neural network (DNN) inference mainly relies on the on-chip block random access memory (BRAM) for model storage. However, DNN is typically memory-bound and its energy consumption is dominated by the memory access. In this talk, I will discuss how we can deploy computing in-memory to the existing BRAM architecture on FPG As. The resulting BRAM architecture, called BRAMAC , can compute multiply-accumulate (MAC) operations within BRAM , which significantly reduces the cost of memory access and offers much higher computing throughput than existing FPGA architectures. Unlike many prior computing in-memory works based on application-specific integrated circuits (ASICs), BRAMAC has small area overhead by only performing computation in the digital domain and eliminating expensive analog-digital conversion. Moreover, during BRAMAC computation, other FPGA resources can seamlessly access its data without the need for a separate BRAM buffer. Hence, BRAMAC can simultaneously perform computation and maintain full functionality as a memory unit to truly complement other compute resources on FPG As. Finally, I will show how BRAMAC can be extended to support mixed-precision computation and dataflow reconfiguration for improved hardware utilization.

This seminar will be online on Zoom: https://utoronto.zoom.us/j/82613229697

This talk is part of the CAS Talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Changes to Talks@imperial | Privacy and Publicity