Network-Level FPGA Acceleration of Low Latency Market Data Feed Arbitration

Stewart DENHOLM  Hiroaki INOUE  Takashi TAKENAKA  Tobias BECKER  Wayne LUK  

Publication
IEICE TRANSACTIONS on Information and Systems   Vol.E98-D   No.2   pp.288-297
Publication Date: 2015/02/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2014RCP0011
Type of Manuscript: Special Section PAPER (Special Section on Reconfigurable Systems)
Category: Application
Keyword: 
data feed arbitration,  acceleration,  FPGA,  low latency,  finance,  

Full Text: PDF(1.1MB)>>
Buy this Article




Summary: 
Financial exchanges provide market data feeds to update their members about changes in the market. Feed messages are often used in time-critical automated trading applications, and two identical feeds (A and B feeds) are provided in order to reduce message loss. A key challenge is to support A/B line arbitration efficiently to compensate for missing packets, while offering flexibility for various operational modes such as prioritising for low latency or for high data reliability. This paper presents a reconfigurable acceleration approach for A/B arbitration operating at the network level, capable of supporting any messaging protocol. Two modes of operation are provided simultaneously: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. We also present a model for message feed processing latencies that is useful for evaluating scalability in future applications. We outline a new low latency, high throughput architecture and demonstrate a cycle-accurate testing framework to measure the actual latency of packets within the FPGA. We implement and compare the performance of the NASDAQ TotalView-ITCH, OPRA and ARCA market data feed protocols using a Xilinx Virtex-6 FPGA. For high reliability messages we achieve latencies of 42ns for TotalView-ITCH and 36.75ns for OPRA and ARCA. 6ns and 5.25ns are obtained for low latency messages. The most resource intensive protocol, TotalView-ITCH, is also implemented in a Xilinx Virtex-5 FPGA within a network interface card; it is used to validate our approach with real market data. We offer latencies 10 times lower than an FPGA-based commercial design and 4.1 times lower than the hardware-accelerated IBM PowerEN processor, with throughputs more than double the required 10Gbps line rate.