Imperial College London > Talks@ee.imperial > CAS Talks > Enhancing Performance of Tall-Skinny QR Factorization using FPGAs
Log inImperial users Other users No account?Information onFinding a talk Adding a talk Syndicating talks Who we are Everything else |
Enhancing Performance of Tall-Skinny QR Factorization using FPGAsAdd to your list(s) Download to your calendar using vCal
If you have a question about this talk, please contact Grigorios Mingas. Communication-avoiding linear algebra algorithms with low communication latency and high memory bandwidth require- ments like Tall-Skinny QR factorization (TSQR) are highly appropriate for acceleration using FPG As. TSQR paral- lelizes QR factorization of tall-skinny matrices in a divide- and-conquer fashion by decomposing them into sub-matrices, performing local QR factorizations and then merging the intermediate results. As TSQR is a dense linear algebra problem, one would therefore imagine GPU to show better performance. However, the performance of GPU is lim- ited by the memory bandwidth in local QR factorizations and global communication latency in the merge stage. We exploit the shape of the matrix and propose an FPGA -based custom architecture which avoids these bottlenecks by using high-bandwidth on-chip memories for local QR factoriza- tions and by performing the merge stage entirely on-chip to reduce communication latency. We achieve a peak double- precision floating-point performance of 129 GFLO Ps on Virtex- 6 SX475T . A quantitative comparison of our proposed de- sign with recent QR factorization on FPG As and GPU shows up to 7.7× and 12.7× speed up respectively. Additionally, we show even higher performance over optimized linear al- gebra libraries like Intel MKL for multi-cores, CULA for GPUs and MAGMA for hybrid systems. This talk is part of the CAS Talks series. This talk is included in these lists:
Note that ex-directory lists are not shown. |
Other listsThe FuturICT Flagship: Creating Socially Interactive Information Technologies for a Sustainable Future AI- and HCI-related talks Type the title of a new list hereOther talksControl of CPS using Passivity and Symmetry |