Imperial College London > Talks@ee.imperial > CAS Talks > Memory Optimized Convolution Hardware Accelerator for Training (MOCHA-T)

Memory Optimized Convolution Hardware Accelerator for Training (MOCHA-T)

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact George A Constantinides.

Over recent years the popularity of deep learning (DL) models has risen notably. Consequently, the complexity and computation demands have risen accordingly. Currently most computation is still being performed on GPUs in industrial data servers, which comes with a large cost and power consumption. For convolutional neural networks (CNN) training is computationally more demanding than inference due to there being 3x the amount of the most computationally demanding operation: convolution. Most convolutions are performed as a general matrix multiply (GEMM) which requires the input data to be transformed to be eligible for a matrix multiplication through a process known as im2col. The im2col operation drastically increase the memory required due to a large amount of data duplication. The benefits of reconfigurability comes with the cost of worse performance due to FPG As being notably memory-bound, especially so when performing matrix multiplications. The proposed novel FPGA design alleviates this bottleneck by performing the im2col operation on the FPGA by which the volume of data sent is drastically reduced, creating a compute bound system. Making the FPGA compute bound allows for performance comparable to some of the most modern GPUs at a far lower power consumption.

This talk is part of the CAS Talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Changes to Talks@imperial | Privacy and Publicity