Why is “Low Latency” Important at an Architectural Level?
Of the four major performance characteristics, or dimensions — processor (CPU) speed, memory (storage), bandwidth, and latency — latency has seen only modest improvements over the past several decades of modern computing, while the other three have experienced growth substantially described by Moore’s Law.
That is, the speed, storage, and bandwidth offered by computing (information) systems have improved substantially year over year for many years. Meanwhile, the limiting factor for latency — the speed of light — has experienced no improvement whatsoever, nor is there any such improvement (increase in the speed of light or an alternative means of communication not limited by it) anticipated in the near future, short of a stunning breakthrough in physics. Most improvements in latency have been in the form of modest percentages “on the edges” (of electronic or light-based transmission) as pertinent circuitry has leveraged the benefits in those other three dimensions of performance.
As a “green-field” starting point for information-systems architecture, today’s state of the art would suggest that systems be optimized to minimize latency, first and foremost, to the extent those systems encompass large-scale networks, even if at the expense of increased demands for processing speed, storage, and bandwidth.
For example, Verizon Enterprise’s IP Latency Statistics suggest, as of January 2018, latencies ranging from 8ms to nearly 300ms, depending on the endpoints.
A system that requires N round trips to perform an action can therefore expect performance no better than N*300ms in the worst case documented on that page, N*8ms in the best case.
Such “round trips” include not only overt actions performed by the application, but actions performed at lower levels in the stack in order to carry out those high-level actions. E.g. setting up an outbound TCP connection takes several round trips in and of itself.
Optimizing various endpoints, or across individual stack levels, cannot improve beyond this minimum threshold, unless the system itself is optimized (essentially, re-architected) in some fashion.
While keeping latencies low is well-understood at various levels, or slices, of the computing stack (see Understanding Bandwidth and Latency for just one example), these efforts can fall well short of the full potential of optimizing for low latency as it pertains throughout the stack, including when done via Architecture.
Here, viewing Architecture‘s goal as “making fundamental structural choices which are costly to change once implemented”, the protocols used to implement the communications must necessarily be optimized, via their selection and/or design, for latency — as must any other aspects of the system where performance is a consideration. (Note that optimizing the design of a protocol is distinct from optimizing its implementation, similarly to how choosing an algorithm, based on its performance characteristics, is distinct from optimizing the implementation of that algorithm.)
Thinking back in time prior to today’s theoretical “green-field” starting point, it becomes easier to understand why and how network protocols (including TCP, SMTP, HTTP, and so on) were designed to meet performance requirements that did not prioritize latency above all. In particular, architects and designers had to cope with systems and networks of comparatively limited processing power, storage, and bandwidth — but seemingly “instantaneous” communication (at near the speed of light). Naturally, they chose to ensure their protocols would be realizable by computing machinery with power that was severely limited compared to today’s.
Scaling up systems built on top of those legacy protocols, to support an increasingly widespread online population — of both man and machine — is proving to be increasingly challenging, for reasons of latency at least, nevermind the ongoing challenges presented by sheer complexity, security issues, and so on.
That’s why it’s important for new designs to start out architected to avoid high-latency pitfalls, perhaps even to the extent of avoiding those legacy protocols in favor of lower-latency alternatives (such as UDP and datagrams rather than TCP).
It’s also worth considering whether a “21st-Century” Internet, architected for low latency, is worth pursuing as a “Beyond IPv6” offering.
What might a protocol that optimizes for latency look like? I’m working on a Proof of Concept, and hope to have something to show for it in the near future.
Please let me know what experiences and thoughts you have on optimizing for latency!