Efficient FPGA implementation of qr decomposition using a systolic array architecture
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays - FPGA '08, 2008
QR decomposition is used in many signal processing applications. We have implemented a systolic a... more QR decomposition is used in many signal processing applications. We have implemented a systolic array QR decomposition on a Xilinx Virtex5 FPGA using the Givens rotation algorithm. It uses a truly two dimensional systolic array architecture so latency scales well for large matrices. To accommodate the dynamic range of input data, floating-point arithmetic is chosen, using the Northeastern University Variable Precision Floating-Point (VFloat) library. We support any general floating-point format including IEEE single precision. Our design uses straightforward floating-point divide and square root implementations, compared to prior work which uses special operations or formats such as CORDIC or the logarithmic number system (LNS). This makes our design more standard and portable to different systems, thus easier to fit into a larger design. We support square, tall and short matrices. The input matrix size can be configured at compile-time to virtually any size. Therefore, it can be easily scaled to future larger FPGA devices, or over multiple FPGAs. The QR module is fully pipelined with a throughput of over 130 MHz for IEEE single precision floating-point format. 35 GFlops throughput peak performance is achieved for a 12 by 12 matrix with this implementation
Uploads
Papers by Miriam Leeser