Imperial College London > Talks@ee.imperial > CAS Talks > Exploiting Algorithm-based Fault Tolerance for Reliability in Parallel Accelerators

Exploiting Algorithm-based Fault Tolerance for Reliability in Parallel Accelerators

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Grigorios Mingas.

This talk will serve as a summary of my recent work on leveraging low-overhead, algorithm-level error detection for the purposes of fault detection, location and avoidance/correction. The current benchmark application, a parameterisable matrix multiplication accelerator running on the Xilinx Zynq platform, will be presented, followed by an explanation of the adopted error detection scheme. Hardware modifications to support both error detection and runtime algorithm modification will be explained, with exploration into the overheads and potential gains of several candidate correction strategies.

This talk is part of the CAS Talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Changes to Talks@imperial | Privacy and Publicity