FPGA numerical computing for high frequency trading
Keywords:Decimal Floating Point Arithmetic DFPA high-frequency trading HFT FPGA
The ongoing struggle to minimise "slippage"—which is broadly defined as the latency between the instances of trader order execution and transaction actualisation at the exchange—with consequent uncalculated monetary variations motivates HFT technology towards a theoretical 'zero-latency' objective. Conventional approaches such as exchange proximity hosting, colocation, hardware ticker plants, and lossless LAN switches have been superseded by deploying FPGA acceleration to offload network and application protocols, and to run trader processes such as portfolio order and execution management. The merit for intense, real-time numerical computing of hundreds of financial indicators and/or indices at sub-millisecond tick rates is often associated with decimal accuracy compliance requirements.
Decimal Floating-Point Arithmetic (DFPA) in a nutshell
Binary Floating Point Arithmetic (BFPA) runs efficiently in native processor hardware, but is unable to accurately represent/maintain decimal real numbers. This is because 1/2 + 1/4 + 1/8 + ... does not cover the entire decimal fraction numeric space; in fact, even the decimal quantity of 0.1 cannot be accurately represented. The choice of whether, and how, to implement decimal real number accuracy is left up to the software developer. The statistical nature of most trading computations inherently affords higher tolerance to binary real number inaccuracies than deterministic financial applications (e.g., banking). However, financial regulations and some algorithmic and operational considerations may require DFP encoding and arithmetic.
Approaches to maintain DFP accuracy include the following:
1. Deploy server platforms with DFPA processor support, such as the IBM Power and Oracle SPARC, which support basic operations like addition and multiplication in hardware and build the more demanding operations algorithmically in software.
2. Scale up all real numbers to integers, perform all-integer computations, and then down-scale them before they are passed to other processes. This approach results in poor code management, especially among non-uniform precisions, possibly reaching 14 places beyond the decimal point (Ref: S&P Dow Jones Indices, "S&P Global 1200 Methodology," Apr. 2011).
3. Use software DFPA libraries, such as the Intel Math Decimal Floating Point Library, and bear with the consequent computational latency.
In order to address this issue, SilMinds offers a patented, extensively-verified, 64/128bit IEEE 754-2008 standard compliant DFPA IP units library that covers operations like division, power, square rooting, and indexed summation. Units internally employ the hardware-efficient BCD-like DPD (Densely Packed Decimal) encoding, but their I/O interfaces support the more compact software-oriented BID (Binary Integer Decimal) and ASCII "string"-based encodings.
Real-time out-of-band numerical computing model
Many algorithmic traders choose to conduct real number arithmetic in a software DFP form or workaround, and then bear with the undesirable increased slippage. By comparison, with currently available FPGA clock rates, the SilMinds library offers order of nanosecond DFPA operations.
Integrating "performance costly" cross conversions from/to computation-agnostic "string" number representations of standard Financial Information exchange (FIX) and other legacy proprietary protocols with DFPA units is a major value added. Since BFPA units are available at low cost and small area, thanks to their ubiquitous use as DSP blocks, it is easy to optimise FPGA utilisation with a combination of integer, BFP, and DFP arithmetic units.
Visit Asia Webinars to learn about the latest in technology and get practical design tips.