William Chan's blag

Some Quick Thoughts on TCP Fast Open and Chromium

Since this came up yesterday, I figured I ought to write up a quick & dirty post about Chromium and TCP Fast Open. The current status is it’s experimentally supported for new Linux kernel versions (and requires both client and server kernel support). The basic status of the Chromium implementation is there’s a flag to enable it for testing. All that flag does is, if the kernel/system (even if the kernel supports it, the system wide setting may have disabled it) supports/enables TCP Fast Open, then we’ll try to use TCP Fast Open. We’ll optimistically return success for Connect() and try to take advantage of TCP Fast Open cookies on the first Write(). That’s all the code does really for now. But this implementation is broken in a number of critical edge cases. Let’s consider the ways it’s broken:

  1. TCP Fast Open requires application level idempotency. As noted in the text, it’s quite possible for the server to receive a duplicate SYN (with data). It must be resilient to this. In the web browsing scenario, POSTs are not idempotent. GETs are supposed to get idempotent, and indeed things like HTTP pipelining depend heavily on this requirement. Of course, it’s not strictly true in practice, but that’s a server-side bug. In the web browser case, proper use of TCP Fast Open would probably require only attempting to use it with idempotent requests. We’re not doing that yet. So there’s definitely a small risk of multiple POSTs, and if the server application doesn’t detect that, then it could be a serious problem. We need to fix our implementation to only try TCP Fast Open when the initial data packet doesn’t have any unsafe side effects (HTTP GET, or higher layer handshakes like SSL’s CLIENT_HELLO).
  2. TCP Fast Open violates a number of internal code assumptions in the Chromium HTTP stack. We naturally assume that all connection errors will be returned upon completion of Connect(). But as I explained earlier, we just optimistically return success for Connect() and return the connection error, if any, in the response to the Write(). This mostly works, but it’s obviously pretty dicey and we need to go through all our code to iron out the edge cases where this fails. Or we need to change the API to detect Fast Open support prior to the Connect() call, and if it’s likely to succeed, we can call a new method like ConnectWithData() or something.
  3. TCP Fast Open defeats late binding when the zero roundtrip connect falls back to the 3-way handshake. When our code optimistically returns connection success, the socket will be handed up to the HTTP stack for use and it’ll try to use it. But if the TCP Fast Open cookie fails to work and we fall back to the 3-way handshake, then our HTTP request is tightly bound to this socket that is in progress of being connected. This prevents late binding which would result in better prioritized allocation of requests to available connected sockets. Unfortunately, there’s no good API to the kernel to detect whether or not the kernel has a TCP Fast Open cookie for the destination server (and who knows how likely it is that servers will keep cookies around for long enough for them to be useful?). Until a better API exists, the only other alternative is to try to implement an application level cache to predict whether or not we think it’s worth it to try TCP Fast Open for a host.
  4. TCP Fast Open doesn’t play well with our TCP preconnect code. Our TCP preconnect code will try to start the SYNs early so we can mitigate the RT cost of the TCP 3-way handshake. We get caught in this situation where maybe we shouldn’t preconnect because maybe there’s a chance that Blink’s resource loader will request a resource on that host in the near future, before a TCP SYNACK would come in, and we might want to attempt a TCP Fast Open “connect” instead. But the preconnect code is way more dependable, since it works on all servers and doesn’t rely on TCP extensions that middleboxes may barf on. Note however that TCP Fast Open should combine nicely with SSL (and other higher layer) preconnect handshakes.

These problems are all addressable to some degree. Indeed, problems (1) and (2) are simple correctness issues that we just need to fix, but we haven’t begun addressing them yet. (3) is also relatively easy to address with some extra application layer logic to improve the TCP Fast Open success rate. (4) is a little bit tricky and we’ll have to experiment to see what works best in practice. And it’s hard to conceive of a case where TCP Fast Open is not a straight up win for HTTPS.

In short, there’s a lot of potential here, but the implementation is totally na├»ve now and needs more work to fix correctness and performance problems before it’s ready for actual end user usage.