High-Speed Fully-Adaptable CRC Accelerators

Amila AKAGIC  Hideharu AMANO  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E96-D   No.6   pp.1299-1308
Publication Date: 2013/06/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.E96.D.1299
Print ISSN: 0916-8532
Type of Manuscript: PAPER
Category: Computer System
Keyword: 
reconfigurable computing,  FPGAs,  cyclic redundancy checks,  adaptability,  accelerators,  

Full Text: PDF>>
Buy this Article




Summary: 
Cyclic Redundancy Check (CRC) is a well known error detection scheme used to detect corruption of digital content in digital networks and storage devices. Since it is a compute-intensive process which adversely affects performance, hardware acceleration using FPGAs has been tried and satisfactory performance has been achieved. However, recent extended usage of networks and storage systems require various correction capabilities for various CRC standards. Traditional hardware designs based on the LFSR (Linear Feedback Shift Register) tend to have fixed structure without such flexibility. Here, fully-adaptable CRC accelerator based on a table-based algorithm is proposed. The table-based algorithm is a flexible method commonly used in software implementations. It has been rarely implemented with the hardware, since it is believed that the operational speed is not enough. However, by using pipelined structure and efficient use of memory modules in FPGAs, it appeared that the table-based fixed CRC accelerators achieved better performance than traditional implementation. Based on the implementation, fully-adaptable CRC accelerator which eliminate the need for many non-adaptable CRC implementations is proposed. The accelerator has ability to process arbitrary number of input data and generates CRC for any known CRC standard, up to 65 bits of generator polynomial, during run-time. Further, we modify Table generation algorithm in order to decrease its space complexity from O(nm) to O(n). On Xilinx Virtex 6 LX550T board, the fully-adaptable accelerators occupy between 1 to 2% area to produce maximum of 289.8 Gbps at 283.1 MHz if BRAM is deployed, or between 1.6 - 14% of area for 418 Gbps at 408.9 MHz if tables are implemented in logic. Proposed architecture enables further expansion of throughput by increasing a number of input bits M processed at a time.