Low-latency programming in HFT isn't about Big-O - it's about cache lines¶
In high-frequency trading, conventional software wisdom often breaks down. We're not optimizing for scalability or code elegance - we're optimizing for nanoseconds.
Most systems in HFT are memory-bound, not CPU-bound. Modern CPUs can execute billions of instructions per second, but if your data isn't in the CPU cache, it doesn't matter. Accessing L1 cache takes about 1 nanosecond. Accessing main RAM? Over 100 nanoseconds. That's a 100x penalty for missing the cache.
This has major implications:¶
It's better to use more CPU cycles on data that's already in cache than fewer cycles on data scattered across memory.
Flat arrays in contiguous memory almost always outperform complex structures like trees or general purpose hash maps that reference objects randomly.
Simple, predictable memory layouts beat "clever" abstractions every time.
Even if a data structure has better theoretical complexity, it doesn't help if every access results in a cache miss. In HFT, cache locality is king.
Conclusion¶
Big-O tells you how an algorithm scales in theory. Cache-friendly design tells you how fast it runs in practice.