TLS Symmetric Crypto (27 Feb 2014)
At this time last year, the TLS world was mostly running on RC4-SHA and AES-CBC. The Lucky 13 attack against CBC in TLS had just been published and I had spent most of January writing patches for OpenSSL and NSS to implement constant-time CBC decoding. The RC4 biases paper is still a couple of week away, but it's already clear that both these major TLS cipher suite families are finished and need replacing. (The question of which is worse is complicated.)
Here's Chrome's view of TLS by number of TLS connections from around this time (very minor cipher suite families were omitted):
|Cipher suite family||Percentage of TLS connections made by Chrome|
A whole bunch of people, myself included, have been working to address that since.
AES-GCM was already standardised, implemented in OpenSSL and has hardware support in Intel chips, so Wan-Teh Chang and I started to implement it in NSS, although first we had to implement TLS 1.2, on which it depends. That went out in Chrome 30 and 31. Brian Smith (and probably others!) at Mozilla shipped that code in Firefox also, with Firefox 27.
AES-GCM is great if you have hardware support for it: Haswell chips can do it in just about 1 cycle/byte. However, it's very much a hardware orientated algorithm and it's very difficult to implement securely in software with reasonable speed. Also, since TLS has all the infrastructure for cipher suite negotiation already, it's nice to have a backup in the wings should it be needed.
So we implemented ChaCha20-Poly1305 for TLS in NSS and OpenSSL. (Thanks to Elie Bursztein for doing a first pass in OpenSSL.) This is an AEAD made up of primitives from djb and we're using implementations from djb, Andrew M, Ted Krovetz and Peter Schwabe. Although it can't beat AES-GCM when AES-GCM has dedicated hardware, I suspect there will be lots of chips for a long time that don't have such hardware. Certainly most mobile phones at the moment don't (although AArch64 is coming).
Here's an example of the difference in speeds:
|Chip||AES-128-GCM speed||ChaCha20-Poly1305 speed|
|OMAP 4460||24.1 MB/s||75.3 MB/s|
|Snapdragon S4 Pro||41.5 MB/s||130.9 MB/s|
|Sandy Bridge Xeon (AESNI)||900 MB/s||500 MB/s|
(The AES-128-GCM implementation is from OpenSSL 1.0.1f. Note, ChaCha20 is a 256-bit cipher and AES-128 obviously isn't.)
There's also an annoying niggle with AES-GCM in TLS because the spec says that records have an eight byte, explicit nonce. Being an AEAD, the nonce is required to be unique for a given key. Since an eight-byte value is too small to pick at random with a sufficiently low collision probability, the only safe implementation is a counter. But TLS already has a suitable, implicit record sequence counter that has always been used to number records. So the explicit nonce is at best a waste of eight bytes per record, and possibly dangerous should anyone attempt to use random values with it. Thankfully, all the major implementations use a counter and I did a scan of the Alexa, top 200K sites to check that none are using random values - and none are. (Thanks to Dr Henson for pointing out that OpenSSL is using a counter, but with a random starting value.)
For ChaCha20-Poly1305 we can save 8 bytes of overhead per record by using the implicit sequence number for the nonce.
Cipher suite selection
Given the relative merits of AES-GCM and ChaCha20-Poly1305, we wish to use the former when hardware support is available and the latter otherwise. But TLS clients typically advertise a fixed preference list of cipher suites and TLS servers either respect it, or override the client's preferences with their own, fixed list. (E.g. Apache's SSLHonorCipherOrder directive.)
So, first, the client needs to alter its cipher suite preference list depending on whether it has AES-GCM hardware support - which Chrome now does. Second, servers that enforce their own preferences (which most large sites do) need a new concept of an equal-preference group: a set of cipher suites in the server's preference order which are all “equally good”. When choosing a cipher suite using the server preferences, the server finds its most preferable cipher suite that the client also supports and, if that is in an equal preference group, picks whichever member of the group is the client's most preferable. For example, Google servers have a cipher suite preference that includes AES-GCM and ChaCha20-Poly1305 cipher suites in an equal preference group at the top of the preference list. So if the client supports any cipher suite in that group, then the server will pick whichever was most preferable for the client.
The end result is that Chrome talking to Google uses AES-GCM if there's hardware support at the client and ChaCha20-Poly1305 otherwise.
After a year of work, let's see what Chrome's view of TLS is now:
|Cipher suite family||Percentage of TLS connections made by Chrome|
|AES128-GCM & ChaCha20-Poly1305||39.9%|
What remains is standardisation work and getting the code out to the world for others. The ChaCha20-Poly1305 cipher suites are unlikely to survive the IETF without changes so we'll need to respin them at some point. As for getting the code out, I hope to have something to say about that in the coming months. Before then I don't recommend that anyone try to repurpose code from the Chromium repos - it is maintained with only Chromium in mind.
Now that TLS's symmetric crypto is working better there's another, long-standing problem that needs to be addressed: TLS fallbacks. The new, AEAD cipher suites only work with TLS 1.2, which shouldn't be a problem because TLS has a secure negotiation mechanism. Sadly, browsers have a long history of doing TLS fallbacks: reconnecting with a lesser TLS version when a handshake fails. This is because there are lots of buggy HTTPS servers on the Internet that don't implement version negotiation correctly and fallback allows them to continue to work. (It also promotes buggy implementations by making them appear to function; but fallback was common before Chrome even existed so we didn't really have a choice.)
Chrome now has a four stage fallback: TLS 1.2, 1.1, 1.0 then SSLv3. The problem is that fallback can be triggered by an attacker injecting TCP packets and it reduces the security level of the connection. In the TLS 1.2 to 1.1 fallback, AES-GCM and ChaCha20-Poly1305 are lost. In the TLS 1.0 to SSLv3 fallback, all elliptic curve support, and thus ECDHE, disappears. If we're going to depend on these new cipher suites then we cannot allow attackers to do this, but we also cannot break all the buggy servers.
So, with Chrome 33, we implemented a fallback SCSV. This involves adding a pseudo-cipher suite in the client's handshake that indicates when the client is trying a fallback connection. If an updated server sees this pseudo-cipher suite, and the connection is not using the highest TLS version supported by the server, then it replies with an error. This is essentially duplicating the version negotiation in the cipher suite list, which isn't pretty, but the reality is that the cipher suite list still works but the primary version negotiation is rusted over with bugs.
With this in place (and also implemented on Google servers) we can actually depend on having TLS 1.2, at least between Chrome and Google (for now).
Few rollouts are trouble free and Chrome 33 certainly wasn't, although most of the problems were because of a completely different change.
Firstly, ChaCha20-Poly1305 is running at reduced speed on ARM because early-revision, S3 phones have a problem that causes the NEON, Poly1305 code to sometimes calculate the wrong result. (It's not clear if it's the MSM8960 chip or the power supply in the phone.) Of course, it works well enough to run the unit tests successfully because otherwise I wouldn't have needed two solid days to pin it down. Future releases will need to run a self-test to gather data about which revisions have the problem and then we can selectively disable the fast-path code on those devices. Until then, it's disabled everywhere. (Even so, ChaCha20-Poly1305 is still much faster than AES-GCM on ARM.)
But it was the F5 workaround that caused most of the headaches. Older (but common) firmware revisions of F5 devices hang the connection when the ClientHello is longer than 255 bytes. This appeared to be a major issue because it meant that we couldn't add things to the ClientHello without breaking lots of big sites that use F5 devices. After a plea and a discussion on the TLS list, an engineer from F5 explained that it was just handshake messages between 256 and 511 bytes in length that caused the issue. That suggested an obvious solution: pad the handshake message so that it doesn't fall into the troublesome size range.
Chrome 33 padded all ClientHellos to 512 bytes, which broke at least two “anti-virus” products that perform local MITM of TLS connections on Windows and at least one “filtering” device that doesn't do MITM, but killed the TCP connections anyway. All made the assumption that the ClientHello will never be longer than some tiny amount. In the firewall case, the fallback SCSV stopped a fallback to SSLv3 that would otherwise have hidden the problem by removing the padding.
Thankfully, two of the three vendors have acted quickly to address the issue. The last, Kaspersky, has known about these sorts of problems for 18 months now (since Chrome tried to deploy TLS 1.1 and hit a similar issue) but, thankfully, doesn't enable MITM mode by default.
Overall, it looks like the changes are viable. Hopefully the path will now be easier for any other clients that wish to do the same.