Imperial College London > Talks@ee.imperial > CAS Talks > CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact George A Constantinides.

In this talk I will present CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, aiming to perform high-throughput inference, by exploiting the computation time-accuracy trade-off. A two-stage architecture tailored for any given FPGA device is generated, consisting of a low- and high-precision unit in a cascade. A confidence evaluation unit is employed between them to identify misclassified cases from the excessively low-precision unit and forward them to the high-precision unit for re-processing. Experiments demonstrate that the proposed toolflow can achieve a performance boost of up to 55% for VGG -16 and 48% for AlexNet over the baseline design for the same resource budget and accuracy, without the need of retraining the model or accessing the training data.

This talk is part of the CAS Talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Changes to Talks@imperial | Privacy and Publicity