Why not DANE in browsers

Thomas Ptacek laid out a number of arguments against DNSSEC recently (and in a follow up). We don't fully agree on everything, but it did prompt me to write why, even if you assume DNSSEC, DANE (the standard for speaking about the intersection of TLS and DNSSEC) is not a foregone conclusion in web browsers.

There are two ways that you might wish to use DANE in a web browser: either to block a certificate that would normally be considered valid, or to bless a certificate that would normally be rejected. The first, obviously, requires that DANE information always be obtained—if a lookup failure was ignored, a network attacker with a bad certificate would just simulate a lookup failure. But requiring that browsers always obtain DANE information (or a proof of absence) is nearly implausible:

Some years ago now, Chrome did an experiment where we would lookup a TXT record that we knew existed when we knew the Internet connection was working. At the time, some 4-5% of users couldn't lookup that record; we assume because the network wasn't transparent to non-standard DNS resource types. DANE records are going to be even more non-standard, are going to be larger and browsers would have to fetch lots of them because they'll need the DNSKEY/RRSIG chain up from the root. Even if DNSSEC record lookup worked flawlessly for everyone, we probably still wouldn't implement this aspect of DANE because each extra lookup is more latency and another chance for packet loss to cause an expensive timeout and retransmit.

Instead, for this we have HPKP, which is a memory-based pinning solution using HTTP headers. We also have pre-loaded pinning in Chrome for larger or more obvious targets. Thomas Ptacek seems bullish on pinning but I'm much more lukewarm. HPKP is quite complex and no doubt someone will write a “supercookies” story about it at some point, as they have for HSTS. Additionally, pinning is quite dangerous. Clients deciding that “pinning is good” have caused headaches at Google. It's also worth noting that CryptoCat has committed pinning-suicide in Chrome at at the moment due to their CA having switched intermediates between renewals. They're waiting for the release of Chrome 41 to recover.

But what about the other side of DANE: blessing certificates that would otherwise be considered untrusted? In this case, DNSSEC can be seen as something like another CA. The same problems with looking up DNSSEC records apply, but are much less painful when you only need to depend on the lookup for sites that are using DANE certificates. Indeed, Chrome even supported something very like DANE for a while. In that case the DNSSEC records were contained in the certificate to avoid the latency and complexity of looking them up in the client. (DNSSEC records contains signatures so need not be transported over the DNS protocol.)

But support for that was removed because it was a bunch of parsing code outside of the sandbox, wasn't really being used and it conflicted with two long-term plans for the health of the HTTPS ecosystem: eliminating 1024-bit RSA and Certificate Transparency. The conflicts are probably the most important reasons for not wanting to renew the experiment.

The effort to remove 1024-bit RSA from HTTPS has been going for years and is, perhaps, nearing completion now. (That noise that you can hear is my colleague, Ryan Sleevi, crying softly.). There are still some 1024-bit root certificates, but they are nearly gone from the Mozilla set. The amount of work involved is an order of magnitude greater than you expect because of the interactions of different X.509 validation libraries, intermediate chains and varying root stores on different platforms and versions.

DNSSEC, however, is littered with 1024-bit RSA. You literally can't avoid it because the root zone transits through a 1024-bit key. DNSSEC has them because of (I think) concerns about the size of responses and they are usually rotated every two or three months. The RFC suggests that 1024-bit RSA is good for “most zones” until 2022. Dan Bernstein's paper on Batch NFS deals well with the question of whether that's wise.

Next, Certificate Transparency is our effort to add strong, technical audits to the CA system by creating a trustworthy log of all valid certificates. CT logs only accept certificates from CA as an anti-spam measure but people can create DANE certificates for domains at will. This is far from an insurmountable problem, but it is a problem that would need to be solved and the CT team already have their hands full with the staged rollout of CT in Chrome.

The 1024-bit RSA problem isn't insurmountable either (although it's baked in much deeper), so it's possible that browsers might accept a DNSSEC signature chain in a certificate in the future, but it's a long way out.

The POODLE bites again

October's POODLE attack affected CBC-mode cipher suites in SSLv3 due to SSLv3's under-specification of the contents of the CBC padding bytes. Since SSLv3 didn't say what the padding bytes should be, implementations couldn't check them and that opened SSLv3 up to an oracle attack.

We're done pretty well at killing off SSLv3 in response to that. Chrome 39 (released Nov 18th) removed fallback to SSLv3 and Chrome 40 is scheduled to remove SSLv3 completely. Firefox 34 (released Dec 1st) has already removed SSLv3 support.

We're removing SSLv3 in favour of TLS because TLS fully specifies the contents of the padding bytes and thus stops the attack. However, TLS's padding is a subset of SSLv3's padding so, technically, you could use an SSLv3 decoding function with TLS and it would still work fine. It wouldn't check the padding bytes but that wouldn't cause any problems in normal operation. However, if an SSLv3 decoding function was used with TLS, then the POODLE attack would work, even against TLS connections.

This was noted by, at least, Brian Smith on the TLS list ([1][2]) and I was sufficiently cynical to assume that there were probably more instances of this than the old versions of NSS that Brian cited, and so wrote a scanner for the issue.

Unfortunately, I found a number of major sites that had this problem. At least one of whom I had good enough contacts at to quickly find that they used an F5 device to terminate connections. I contacted F5 on October 21st and they started working on a fix. Yngve Pettersen also independently found this issue and contacted me about it around this time.

F5 reported that some of the affected sites weren't customers of theirs, which meant that there was (at least) a second vendor with the same issue. After more digging, I found that some A10 devices also have this problem. I emailed a number of contacts at A10 on October 30th but sadly didn't get a reply from any of them. It wasn't until November 13th that I found the right person at A10 to deal with this.

F5 and A10 have posted patches for their products (F5's are here and A10's are here and they have an advisory here). I'm not completely sure that I've found every affected vendor but, now that this issue is public, any other affected products should quickly come to light. (Citrix devices have an odd behaviour in this area in that they'll accept padding bytes that are all zeros, but not random padding. That's unexpected but I can't make an attack out of it.)

(Update: since posting this, it appears that products from Fortinet, Cisco, IBM (WebSphere, Domino, Tivoli) and Juniper may also be affected.)

Ivan Ristić has added a test for this issue to his excellent scanner at SSLLabs. Affected sites will have their grade set to F and will report “This server is vulnerable to the POODLE attack against TLS servers”.

This seems like a good moment to reiterate that everything less than TLS 1.2 with an AEAD cipher suite is cryptographically broken. An IETF draft to prohibit RC4 is in Last Call at the moment but it would be wrong to believe that RC4 is uniquely bad. While RC4 is fundamentally broken and no implementation can save it, attacks against MtE-CBC ciphers have repeatedly been shown to be far more practical. Thankfully, TLS 1.2 support is about to hit 50% at the time of writing.

POODLE attacks on SSLv3

My colleague, Bodo Möller, in collaboration with Thai Duong and Krzysztof Kotowicz (also Googlers), just posted details about a padding oracle attack against CBC-mode ciphers in SSLv3. This attack, called POODLE, is similar to the BEAST attack and also allows a network attacker to extract the plaintext of targeted parts of an SSL connection, usually cookie data. Unlike the BEAST attack, it doesn't require such extensive control of the format of the plaintext and thus is more practical.

Fundamentally, the design flaw in SSL/TLS that allows this is the same as with Lucky13 and Vaudenay's two attacks: SSL got encryption and authentication the wrong way around – it authenticates before encrypting.

Consider the following plaintext HTTP request, which I've broken into 8-byte blocks (as in 3DES), but the same idea works for 16-byte blocks (as in AES) just as well:


The last block contains seven bytes of padding (represented as •) and the final byte is the length of the padding. (And I've used a fictional, 8-byte MAC, but that doesn't matter.) Before transmission, those blocks would be encrypted with 3DES or AES in CBC mode to provide confidentiality.

Now consider how CBC decryption works at the receiver, thanks to this public domain diagram from Wikipedia:

image/svg+xml block cipherdecryption Key Plaintext Ciphertext Initialization Vector (IV) block cipherdecryption Key Plaintext Ciphertext block cipherdecryption Key Plaintext Ciphertext

An attacker can't see the plaintext contents like we can in the diagram, above. They only see the CBC-encrypted ciphertext blocks. But what happens if the attacker duplicates the block containing the cookie data and overwrites the last block with it? When the receiver decrypts the last block it XORs in the contents of the previous ciphertext (which the attacker knows) and checks the authenticity of the data. Critically, since SSLv3 doesn't specify the contents of the padding (•) bytes, the receiver cannot check them. Thus the record will be accepted if, and only if, the last byte ends up as a seven.

An attacker can run Javascript in any origin in a browser and cause the browser to make requests (with cookies) to any other origin. If the attacker does this block duplication trick they have a 1-in-256 chance that the receiver won't reject the record and close the connection. If the receiver accepts the record then the attacker knows that the decryption of the cookie block that they duplicated, XORed with the ciphertext of the previous block, equals seven. Thus they've found the last byte of the cookie using (on average) 256 requests.

Now the attacker can increase the length of the requested URL and decrease the length of something after the cookies and make the request look like this:


Note that the Cookie data has been shifted so that the second to last byte of the data is now at the end of the block. So, with another 256 requests the attacker can expect to have decrypted that byte and so on.

Thus, with an average of 256×n requests and a little control of the layout of those requests, an attacker can decrypt n bytes of plaintext from SSLv3. The critical part of this attack is that SSLv3 doesn't specify the contents of padding bytes (the •s). TLS does and so this attack doesn't work because the attacker only has a 2-64 or 2-128 chance of a duplicated block being a valid padding block.

This should be an academic curiosity because SSLv3 was deprecated very nearly 15 years ago. However, the Internet is vast and full of bugs. The vastness means that a non-trivial number of SSLv3 servers still exist and workarounds for the bugs mean that an attacker can convince a browser to use SSLv3 even when both the browser and server support a more recent version. Thus, this attack is widely applicable.

SSL/TLS has a perfectly good version negotiation mechanism that should prevent a browser and server that support a modern TLS version from using anything less. However, because some servers are buggy and don't implement version negotiation correctly, browsers break this mechanism by retrying connections with lesser SSL/TLS versions when TLS handshaking fails. These days we're more aware of the fact that fallback behaviour like this is a landmine for the future (as demonstrated today) but this TLS fallback behaviour was enshrined long ago and we're stuck with it. It means that, by injecting some trivial errors on the network, an attacker can cause a browser to speak SSLv3 to any server and then run the above attack.

What's to be done?

It's no revelation that this fallback behaviour is bad news. In fact, Bodo and I have a draft out for a mechanism to add a second, less bug-rusted mechanism to prevent it called TLS_FALLBACK_SCSV. Chrome and Google have implemented it since February this year and so connections from Chrome to Google are already protected. We are urging server operators and other browsers to implement it too. It doesn't just protect against this specific attack, it solves the fallback problem in general. For example, it stops attackers from downgrading TLS 1.2 to 1.1 and 1.0 and thus removing modern, AEAD ciphers from a connection. (Remember, everything less than TLS 1.2 with an AEAD mode is cryptographically broken.) There should soon be an updated OpenSSL version that supports it.

Even with TLS_FALLBACK_SCSV, there will be a long tail of servers that don't update. Because of that, I've just landed a patch on Chrome trunk that disables fallback to SSLv3 for all servers. This change will break things and so we don't feel that we can jump it straight to Chrome's stable channel. But we do hope to get it there within weeks and so buggy servers that currently function only because of SSLv3 fallback will need to be updated.

Chrome users that just want to get rid of SSLv3 can use the command line flag --ssl-version-min=tls1 to do so. (We used to have an entry in the preferences for that but people thought that “SSL 3.0” was a higher version than “TLS 1.0” and would mistakenly disable the latter.)

In Firefox you can go into about:config and set security.tls.version.min to 1. I expect that other browser vendors will publish similar instructions over the coming days.

As a server operator, it is possible to stop this attack by disabling SSLv3, or by disabling CBC-mode ciphers in SSLv3. However, the compatibility impact of this is unclear. Certainly, disabling SSLv3 completely is likely to break IE6. Some sites will be happy doing that, some will not.

A little further down the line, perhaps in about three months, we hope to disable SSLv3 completely. The changes that I've just landed in Chrome only disable fallback to SSLv3 – a server that correctly negotiates SSLv3 can still use it. Disabling SSLv3 completely will break even more than just disabling the fallback but SSLv3 is now completely broken with CBC-mode ciphers and the only other option is RC4, which is hardly that attractive. Any servers depending on SSLv3 are thus on notice that they need to address that now.

We hardened SSLv3 and TLS 1.0 against the BEAST attack with 1/n-1 record splitting and, based on an idea by Håvard Molland, it is possible to do something called anti-POODLE record splitting this time. I'll omit the details, but one can ensure that the last block in a CBC record contains at least fixed six bytes using only one split for AES and two for 3DES. With this, CBC is probably less bad than RC4. However, while anti-POODLE record splitting should be easy to deploy because it's valid according to the spec, so was 1/n-1 and deploying that was very painful. Thus there's a high risk that this would also cause compatibility problems. Therefore I'm not proceeding with anti-POODLE record splitting and concentrating on removing SSLv3 fallback instead. (There's also the possibility that an attacker could run the attack on the server to client direction. If we assume that both the client and the server get patched then we might as well assume that they are patched for TLS_FALLBACK_SCSV, which makes record splitting moot.)

PKCS#1 signature validation

On Wednesday, Chrome and Mozilla did coordinated updates to fix an RSA signature verification bug in NSS — the crypto library that handles SSL in Firefox and (currently) Chrome on most platforms. The updates should be well spread now and the bug has been detailed on Reddit, so I think it's safe to talk about.

(Hilariously, on the same day, bash turned out to have a little security issue and so hardly anyone noticed the NSS one!)

The NSS bug is another variant of Bleichenbacher's 2006 attack on RSA signature validation. To recap: an RSA signature is roughly the cube root of the message hash modulo the RSA modulus. Verification involves cubing the signature and checking that the result is the hash that you expected. Cubing modulo an RSA modulus is easy but finding a cube root is infeasible.

There's a little more to it because one needs to eliminate some properties of the RSA operation by formatting the hash that you're going to sign — called padding.

The standard for RSA signing and encryption is PKCS#1 version 1.5. The PKCS series of standards are amazing. Although I've no doubt that the people writing them were doing their best, it was a long time ago and mistakes were made. In a modern light, they are all completely terrible. If you wanted something that was plausible enough to be widely implemented but complex enough to ensure that cryptography would forever be hamstrung by implementation bugs, you would be hard pressed to do better. If you can find some implementers, just say "PKCS#11!" or "PKCS#12!" as punchlines. You'll get a good laugh, although possibly also a few panic attacks.

(PKCS#1 version 2 is much better, but only because they took it from Bellare and Rogaway and slapped the PKCS#1 title on it.)

PKCS#1 version 1.5 wanted to include an identifier for the hash function that's being used, inside the signature. This is a fine idea, but they did it by encoding the algorithm and hash value with ASN.1. This caused many implementations to include the complexity of an ASN.1 parser inside signature validation and that let the bugs in.

Bleichenbacher's original attack was based on the observation that the ASN.1 parsers, in many cases, didn't reject extra trailing data. This is reasonable behaviour for a generic ASN.1 parser, but a disaster in signature verification. Because the parser could ignore so much of the signature, Bleichenbacher could arrange for a perfect cube to have a suitable prefix and thus calculate the cube root over the integers — ignoring the RSA modulus completely!

That was fixed, but there was another common bug: the ASN.1 structure used to identify the hash was a standard structure called AlgorithmIdentifier, which includes an optional parameter. Implementations were ignoring the parameter and that also introduced a vulnerability: arbitrary bytes could be inserted as a parameter and that was sufficient to forge signatures. (See section five of this paper for details.)

This week's vulnerability was similar. Antoine Delignat-Lavaud (who, by the way, has been doing stellar work along with the Prosecco team at INRIA: see Triple Handshake, Cookie Cutter, SPDY and virtual host attacks and miTLS) noticed that the check on the contents of the parameters in NSS wasn't very strict — it was only checking that the length was at most two bytes. This is because, due to complexity, there wasn't universal agreement on what the the parameter should be. The ASN.1 for the parameter is an ANY type (with the concrete type depending on a preceding object id) but also optional. So, when not needed, should it be an ASN.1 NULL (which is two bytes long), or should it be omitted completely? The answer is the former, but it was underspecified for a long time.

Once Antoine had put a spotlight on the code, Brian Smith noticed something worse: an integer overflow in the ASN.1 parser. ASN.1 (in DER form at least, because, of course, ASN.1 has multiple defined encodings) has variable-length lengths, and NSS was using a generic ASN.1 parser which didn't check for overflow. So you could specify that a length was arbitrarily long and the parser would do something similar to:

unsigned length = 0;
for (i = 0; i < length_of_length; i++) {
  length <<= 8;
  length |= length[i];

Thus, as long as the last 4 or 8 bytes (depending on whether the system was 32 or 64 bit) encoded the correct length, the bytes before that would be ignored. That allows arbitrary bytes in the signature again and the attack from section 5 of the previous paper can be still be used to make forged signatures.

(Intel's ATR group also reported the same issue to Mozilla a little bit later. Bugs often seem to work like scientific discovery.)

The moral of the story

The moral here is that an ASN.1 parser should never have been put somewhere so sensitive; parsing is dangerous. It was a mistake to have defined PKCS#1 that way, but it can be ameliorated: rather than parsing, generate the expected signature contents and compare it against the plaintext. Go has always done this. I believe that SSH does this. Bouncy Castle reportedly does this. BoringSSL does this because I changed it from OpenSSL (prior to learning about the NSS issue — it's just a good idea). NSS does it now, although they're still worried about the omitted parameter vs NULL parameter confusion so they serialise two ASN.1 outputs and compare against both.

I generated a number of tests of different possible points of flexibility in signature parsing so that other libraries can be tested. For example, you can test OpenSSL (which still uses an ASN.1 parser) against them like this:

for cert in $(ls *.crt | grep -v root2048.crt); do
  openssl verify -CAfile root2048.crt $cert 2>> /dev/null | grep -q "$cert: OK"
  if [ $? -eq 0 ] ; then
    echo "$cert accepted"

The file control.crt should always be accepted. The file missingnull.crt will be accepted if you allow the parameter to be omitted. (I recommend against that. BoringSSL doesn't allow it and we'll see whether we can make that stick.) Nothing else should be accepted.

Sadly, OpenSSL 1.0.1 also accepts highvaltag.crt (ordinary ASN.1 tags in high-value form) and indeflen.crt (BER indefinite lengths rather than DER). The OpenSSL development branch also accepts bernull.crt (superfluous zero bytes in a length). To be clear, there is no known vulnerability here, but it's unwelcome flexibility in something that should be rigid. (I let OpenSSL team know on Wednesday and that they're welcome to use the code from BoringSSL.)

Remember, we use PKCS in browsers because we have to deal with the world as we find it, not as we want it. If you are not so encumbered, consider something simpler.

A couple more formal systems

After my last post on formal analysis of C code, people pointed out several other options to me and I wanted to do a follow up because the conclusion is a little more positive now.

Firstly, I noted last time that Z3 was the best, Why3-compatible SMT solver for my problem but that it came with a non-commercial license. Craig Stuntz pointed out that Microsoft sell a commercial license for $10K. Sure, that's quite a lot for a hobbyist project, but it's easy for a corporation and a lot easier than having to negotiate specially.

The first additional thing that I've looked into is AutoCorres. I mentioned last time that SeL4's refinement proof to C code involved a C parser in Isabelle that output structures in the Simpl framework and that Simpl was pretty intimidating.

AutoCorres is a tool for verifiably simplifing the Simpl structures and thus making proofs easier. Let's repeat the example from last time and show how AutoCorres helps. Here's the test C function:

int add(int a, int b) {
  return a+b;

And here's the raw result from the C parser, exactly as I showed last time:

test_global_addresses.add_body ≡
  Guard SignedArithmetic ⦃-2147483648 ≤ sint ´a + sint ´b ∧ sint ´a + sint ´b ≤ 2147483647⦄
   (creturn global_exn_var_'_update ret__int_'_update (λs. a_' s + b_' s));;
  Guard DontReach {} SKIP

And here's what AutoCorres makes of it:

add' ?a ?b ≡
  DO oguard (λs. INT_MIN ≤ ?a + ?b);
     oguard (λs. ?a + ?b ≤ INT_MAX);
     oreturn (?a + ?b)

That's very impressive! That's using a Maybe monad (using the Haskell name) in Isabelle to handle the possible overflow. Obviously that's a lot easier to deal with than the raw Simpl. Here's the field addition function that I was using for the other examples in the previous post:

void add(int *out, const int *x, const int *y) {
  unsigned int i;
  unsigned int size = 10;
  for (i = 0; i < size; i++) {
    out[i] = x[i] + y[i];

And here it is after AutoCorres:

add' ?out ?x ?y ≡
do i ← whileLoop (λi s. i < 0xA)
          (λi. do guard (λs. is_valid_w32 s (ptr_coerce (?out +⇩p uint i)));
                  guard (λs. is_valid_w32 s (ptr_coerce (?x +⇩p uint i)));
                  guard (λs. is_valid_w32 s (ptr_coerce (?y +⇩p uint i)));
                  guard (λs. INT_MIN ≤ sint (ucast s[ptr_coerce (?x +⇩p uint i)]) + sint (ucast s[ptr_coerce (?y +⇩p uint i)]));
                  guard (λs. sint (ucast s[ptr_coerce (?x +⇩p uint i)]) + sint (ucast s[ptr_coerce (?y +⇩p uint i)]) ≤ INT_MAX);
                  modify (λs. s[ptr_coerce (?out +⇩p uint i) := s[ptr_coerce (?x +⇩p uint i)] + s[ptr_coerce (?y +⇩p uint i)]]);
                  return (i + 1)

While the AutoCorres output looks very clean, there isn't any documentation at the moment which means that it's a little hard to actually use. (There are a couple of papers on the home page that are well worth reading - but I didn't find anything like a user guide.) There are some examples from which it's possible to make some guesses, but I wasn't able to get very far in a proof. Also, since AutoCorres builds on Simpl and C parser, it has the same limitations - most notably that the address of local variables can't be taken.

None the less, AutoCorres holds a lot of promise.

Next, is VST from Andrew Appel and others (yes, that Appel). VST is a framework for proving the behaviour of C code with Coq by using one of the intermediate languages from CompCert. Coq (which I'm pronouncing as “coke”) is a proof system like Isabelle and CompCert is a formally verified compiler. That means that it's possible to have a chain of formal verification from C code all the way to the semantics of the machine language! Additionally, the C parser isn't correctness critical because VST proves properties using the same parse tree as the verified compiler. That's an extremely strong verification standard.

CompCert is a mixture of GPL and “non-commercial” licensed code. It appears that everything needed for VST is licensed as GPL or another free license. So, one can also use VST to verify C code and then use a different compiler to build it if the non-commercial restriction is a problem. That moves the CompCert C parser and the other compiler into the trusted set, but so it goes.

The VST logic also appears to be very capable. It doesn't have a problem with pointers to local variables and should even have the capability to support reasoning about concurrent programs. The restrictions that it does have are local changes: only one memory load per line (at the moment) or side effect per statement and void functions need an explicit return at the end. (Unfortunately, one doesn't get an error from violating these requirements - rather the proof cannot be completed.)

VST thankfully, has brief documentation online which is expanded upon in the form of a book ($15 as a DRMed download). The book contains a very abstract articulation of the underlying logic which, to be honest, defeated me initially. I'd recommend skipping chapters 2 through 21 on a first reading and coming back to them later. They are useful, but they're also a lot clearer after having played with the software for a while.

Here's the same example function, translated into VST style. I switched back from having a special output array because that was something that I changed to make the SPARK verification easier but it was never part of the original code and VST doesn't have problem with overlapping the input and output. I've also switched from a for loop to a while loop because the documentation and examples for the latter are better. Otherwise, the changes are just to limit each statement to a single memory access and to include the final return.

void add(int *a, int *b) {
	int t1, t2;
	int i = 0;
	while (i < 10) {
		t1 = a[i];
		t2 = b[i];
		a[i] = t1 + t2;

We can use clightgen from CompCert to parse that into the intermediate language that CompCert and VST use. (A word to the wise - if you want to install VST, use exactly Coq 8.4p2, not the latest version which is 8.4p4.) Then we can use Coq to prove its behaviour.

Compared to Isabelle, Coq's proofs aren't as clear as the Isar style proofs in Isabelle (although the terseness is more useful when proofs are this complex) and neither Proof General nor CoqIDE are as nice as Isabelle's jedit mode. But it's not otherwise dramatically different.

Here's a specification of add in VST/Coq's language:

Definition bound_int (v : val) (b : Z) :=
  match v with
  | Vint i => -b < (Int.signed i) < b
  | _ => False

Definition partial_add (i: Z) (l: Z -> val) (r: Z -> val) (j: Z) :=
  if zlt j i then Val.add (l j) (r j) else (l j).

Definition add_spec :=
  DECLARE _add
  WITH a0 : val, b0 : val, sh : share, orig_a : Z -> val, orig_b : Z -> val
  PRE [_a OF (tptr tint), _b OF (tptr tint)]
    PROP (writable_share sh;
          forall i, 0 <= i < 10 -> is_int (orig_a i);
          forall i, 0 <= i < 10 -> is_int (orig_b i);
          forall i, 0 <= i < 10 -> bound_int (orig_a i) 134217728;
          forall i, 0 <= i < 10 -> bound_int (orig_b i) 134217728)
    LOCAL (`(eq a0) (eval_id _a);
           `(eq b0) (eval_id _b);
           `isptr (eval_id _a);
           `isptr (eval_id _b))
    SEP (`(array_at tint sh orig_a 0 10 a0);
         `(array_at tint sh orig_b 0 10 b0))
  POST [ tvoid ]
    PROP ()
    LOCAL ()
    SEP (`(array_at tint sh (partial_add 10 orig_a orig_b) 0 10 a0);
         `(array_at tint sh orig_b 0 10 b0)).

The partial_add function implements a map that reflects the state of the a array at step i of the loop. I've written it like that so that it can also be used in the loop invariant.

And here's the proof: It is quite long, but at least half of that is because I've not used Coq before and I don't know what I'm doing. I wouldn't call it readable however.

Definition inv orig_a a0 orig_b b0 sh :=
  EX i:Z,
    (PROP (0 <= i < 11;
           isptr a0;
           isptr b0)
     LOCAL (`(eq a0) (eval_id _a);
            `(eq b0) (eval_id _b);
            `(eq (Vint (Int.repr i))) (eval_id _i))
   SEP (`(array_at tint sh (partial_add i orig_a orig_b) 0 10 a0);
        `(array_at tint sh orig_b 0 10 b0))).

Lemma mod_nop : forall (i : Z) (m : Z), 0 <= i < m -> m > 0 -> i mod m = i.
rewrite Zdiv.Zmod_eq.
apply Zdiv_small.
exact H.
rewrite H1.
exact H0.

Lemma body_sumarray: semax_body Vprog Gprog f_add add_spec.
  (inv orig_a a0 orig_b b0 sh)
     PROP ()
     LOCAL ()
     SEP (`(array_at tint sh (partial_add 10 orig_a orig_b) 0 10 a0);
          `(array_at tint sh orig_b 0 10 b0))).
apply exp_right with 0.
rewrite H7.
unfold partial_add.
assert(is_int (orig_a i)).
assert(is_int (orig_b i)).
exact H7.
apply prop_right.
assert(Vint (Int.add _id1 _id) = partial_add (i+1) orig_a orig_b (Int.signed (Int.repr i))).
Focus 2.
symmetry in H9.
apply H9.
unfold partial_add.
assert(Int.signed (Int.repr i) = i).
rewrite Int.signed_repr.
unfold Int.max_signed, Int.min_signed.
rewrite H9.
unfold Val.add.
rewrite Int.signed_repr_eq in H7.
rewrite mod_nop in H7.
if_tac in H7.
unfold partial_add in H7.
if_tac in H7.
rewrite Int.signed_repr_eq in H6.
rewrite mod_nop in H6.
if_tac in H6.
unfold Int.add, Int.unsigned, Int.intval, Int.repr.
assert(bound_int (orig_a i) 134217728).
unfold bound_int in H12.
symmetry in H7, H6.
rewrite H7.
rewrite H6.
assert(Int.modulus = 4294967296).
rewrite H13.
assert(Int.modulus = 4294967296).
assert(i < Int.half_modulus).
assert(Int.half_modulus = 2147483648).
rewrite H12.
rewrite H12 in H11.
assert(Int.modulus = 4294967296).
assert(Int.modulus = 4294967296).
rewrite H11.
unfold inv.
apply exp_right with (Zsucc i).
apply derives_refl'.
apply equal_f.
apply array_at_ext.
unfold upd.
rewrite H10.
unfold Z.succ.
unfold partial_add.

That's certainly quite a chunk! Practically you have to step through the proof in Proof General and see the state evolve to understand anything. Additionally, when in the proof, it would be very useful if subgoals has some sort of description like “show the loop invariant plus not the loop condition results in the loop post-condition” - it's very easy to get lost in the proof. But I do feel that, while it would be a lot of work, I could make progress with VST, while I don't (at the moment) feel with other systems. Prof. Appel recently did a formal verification of the SHA256 from OpenSSL.

However, there is no community around VST that I can find - no mailing list, wiki, etc. The Subversion repo is only accessible via HTTP - you can't clone it from what I can tell. I think I found a (completeness) bug in VST, but the only option would be to try emailing Prof. Appel. (I didn't; I'm probably wrong.)

Still, AutoCorres and especially VST leave me more optimistic than I was at the end of the last post!


I didn't have time to look at everything that deserved attention. Cryptol is one. Cryptol is a tool written in Haskell designed for the specification and implementation of cryptographic code and can call out to SMT solvers to show certain properties. From its internal language it can export implementations to (at least) C.

Next, Frama-C, which I mentioned last time in relation to the Jessie plugin, has other plugins. One that's useful is the value analysis, which can be properly used to eliminate NULL derefs, array overruns etc. In fact, Polar SSL has been fully validated in certain configurations using that. That's certainly very valuable, and you can do it without that multi-page Coq proof! Although such analysis wouldn't have caught the carry bug in Ed25519 nor the reduction bug in Donna, not all code is going to be suitable for formal analysis.

Microsoft also have lots of things. There's Dafny, VCC (Verifier for Concurrent C), Boogie, Havoc, Z3 and others! I didn't look at them because I don't have a Windows box to hand this week (although I see that some are exposed via a web interface too). I was thinking that maybe I'd have time when a Windows box was to hand but, realistically, probably not. So I've updated this post to mention them here. If you are Windows based, you will probably do well to pay attention to them first.

A shallow survey of formal methods for C code

Two interesting things in formally verified software happened recently. The big one was the release of SeL4 - a formally verified L4 microkernel. The second was much smaller, but closer to my usual scope: a paper which showed the correctness of sections of a couple of the assembly implementations of Curve25519.

Overflow and carry bugs are subtle and, although with 32-bit words you might hope to be able to do enough random testing to eliminate them, that's much less plausible with 64-bit implementations. The TweetNaCl paper mentions a bug that lived in one of the assembly implementations of ed25519 for years and I've sinned too:

When I wrote curve25519-donna (back when it was the only 64-bit implementation of Curve25519), I first wrote a 32-bit version which mirrored the implementation of the athlon assembly version. This was just to provide a reference for the 64-bit code, but it was included in the repository for education. It was never really supposed to be used, and wasn't constant-time, but I was aware that it was being used by some groups.

Many years later, someone noticed that it was missing a reduction in the final contraction. Values between 2255-19 and 2255 would be output when they should have been reduced. This turned out to be harmless (as best I can tell), but embarrassing none the less.

And Curve25519 is a very nice curve to implement! Take a look at the overflow considerations for this version of P-256. I hope that I got everything right there, but it's very complex. More formal verification of the software would be welcome!

SeL4 has a multi-stage verification using refinement. Refinement is the process of showing that two pieces of code behave identically where one version is more abstract. SeL4's proof refines from an abstract specification to a Haskell implementation to the C implementation. It also has a SAT-solver based proof that the ARMv6 assembly matches the C implementation.

The refinement proofs are done in Isabelle/HOL which, along with Coq, are the major proof systems. (There are several other systems including HOL Light, HOL4 and Mizar.) Proof systems assist in the construction of automated proofs using simple rules of inference and axioms. Although not used for software verification, Metamath is a good introduction to the idea. It clearly explains its axioms (which Isabelle and Coq are very bad at) and gives an idea for the scale of formal proofs with an example of 2+2 = 4.

The best introduction to Isabelle that I know of is the book Concrete Semantics, although I do wish that it would start with Isar style proofs much sooner. If you're doing work in Isabelle I think you need page 57 printed and stuck to the wall and if you're going through that book and exercises I'd recommend peeking at the Isar chapter sooner.

But Isabelle's traditional workflow is to write in its functional language and export to OCaml or Haskell. That's no help if we're seeking to prove things about C code.

Isabelle's theorems are objects in its underlying ML language and so you can write a program in ML that parses C and use the result in Isabelle. That's what SeL4 does. The underlying framework for expressing imperative languages in Isabelle is Schirmer's Simpl and this imposes some odd limitations on the C that can be processed. For one, all the local variables with the same name in a given .c file have to have the same type because the local state of all functions in a .c file is represented in a single struct. For the same reason, it's not possible to take the address of a local variable.

But we can try parsing a very simple C file with SeL4's parser:

int add(int a, int b) {
  return a+b;

That fits SeL4's C subset (called StrictC - see l4v/tools/c-parser/doc in the SeL4 source) and results in this Simpl construct (you'll need to read the Simpl paper to really understand it):

test_global_addresses.add_body ≡
  Guard SignedArithmetic ⦃-2147483648 ≤ sint ´a + sint ´b ∧ sint ´a + sint ´b ≤ 2147483647⦄
   (creturn global_exn_var_'_update ret__int_'_update (λs. a_' s + b_' s));;
  Guard DontReach {} SKIP

There's clearly an overflow check in there, which is good, but to call the process of starting to prove things about it intimidating to the beginner would be an understatement. That and the oddities of the C subset motivated me to look further.

Dependent types are closely related to this problem so I checked Wikipedia for the dependently typed languages which support imperative programming and that are still active. That's just ATS and F*, according to that table. ATS doesn't deal with overflows and, while F*/F7 is worthy of a look because of miTLS, it's a CLR language and so not applicable here.

Next up, based on searches, is SPARK. SPARK is a subset of Ada designed for verification. There's a commercial company behind it, but versions are published under the GPL. It even comes with an IDE.

SPARK uses an SMT solver to prove correctness, which is very different from a proof in Isabelle. While an Isabelle proof is, by construction, just an application of axioms and rules of inference, an SMT solver is essentially a magic box into which you feed a proposition and out of which (maybe) comes a true/false answer. Trusting in an SMT solver is much, much stronger than not doing so, but it is a step down from a formal proof system. SPARK uses a version of Why3 to abstract over SMT solvers and we'll discuss Why3 more later.

Anyway, here's the same, trivial function in Ada:

function Add(X, Y : in Integer_32) return Integer_32 is
   return X + Y;
end Add;

If we ask SPARK to process that then it generates verification conditions (VCs): one for each possible thing that could go wrong - overflow, array indexing etc. For that code it throws an error saying that X + Y might overflow, which is promising! In order to eliminate that, one needs to prove the verification condition that says that the addition is safe by adding preconditions to the function (which then become VCs at each callsite):

function Add(X, Y : in Integer_32) return Integer_32 with
  Pre => X < 2**30 and X > -2*30 and Y < 2**30 and Y > -2**30;

With that precondition, the function is accepted. I should note that this is SPARK 2014; there are many older versions (SPARK has been going for a long time) and they kept preconditions and the like in comments. Lots of pages about SPARK still talk about the older versions and major projects in SPARK (like Ironsides - a DNS server) still use the older versions.

With that initial success, let's try something a tiny bit more involved. Curve25519 deals with 255-bit integers but CPUs don't. So, in the same way that a 64-int integer could be represented with a pair of 32-bit integers, 255-bit integers are represented in Donna with ten, 32-bit integers. This representation is a little odd because the values aren't spaced at 32-bit multiples, but rather at alternating 26 and 25 bit positions. That also means that the representation is non-unique (setting bit 26 of the first word is equal to setting bit zero of the second) and that's done for performance. See this for more details, but it's not important right now.

Adding these 255-bit integers is done simply by adding the underlying words without carry, with suitable bounds on the inputs so that overflow doesn't happen:

type Ints is array (Integer range 0 .. 9) of Integer_32;

function Add (X, Y : in Ints) return Ints with
   Pre => (for all i in Ints'Range => X(i) < 2**30 and X(i) > -2**30 and Y(i) < 2**30 and Y(i) > -2**30),
   Post => (for all i in Ints'Range => Add'Result(i) = X(i) + Y(i));

function Add (X, Y : in Ints) return Ints is
   Sum : Ints;
   for i in Ints'Range loop
      Sum(i) := X(i) + Y(i);
   end loop;

   return Sum;
end Add;

That mostly works, but SPARK can't prove that the result is X(i) + Y(i) for all i despite it being exactly what the function says. One really needs to spoon feed the prover: in this case with a loop invariant in the for loop:

for i in Ints'Range loop
   pragma Loop_Invariant (for all j in Ints'First .. i-1 => Sum(j) = X(j) + Y(j));
   Sum(i) := X(i) + Y(i);
end loop;

Sadly, that doesn't work either, despite being correct, because SPARK doesn't seem to like i-1 when i might be zero. So, trying again:

for i in Ints'Range loop
   Sum(i) := X(i) + Y(i);
   pragma Loop_Invariant (for all j in Ints'First .. i => Sum(j) = X(j) + Y(j));
end loop;

That one works, but now SPARK isn't sure whether uninitialised elements of Sum are being referenced. The property of being initialised isn't part of SPARK's logic and seems that the only way to proceed here is to disable that warning!

But, there's some promise! We would now like to show a higher level property of Add: that the 255-bit integer that the returned array represents is the sum of the integers that the input arrays represent. This means that the logic needs to deal with arbitrary integers and that we need to write a function in the logic that turns an array into an abstract integer.

Enabling arbitrary integers in the logic is possible (although good luck finding how to do it in the documentation: the answer is to put a file called gnat.adc in the gnatprove subdirectory of the project containing pragma Overflow_Mode (General => Strict, Assertions => Eliminated);). However, I failed at writing functions in the logic without dropping down into code and triggering overflow checks. It appears that ghost functions should be able to do this but, after getting the answer to a documentation bug on StackOverflow, actually trying to use any logic functions, even the identity function, caused the proof not to terminate. SPARK claims to be able to use Isabelle for proofs but I couldn't get that to work at all: strace says that the .thy file isn't ever written.

Stuck, I moved on from SPARK and tried Frama-C and its Jessie plugin. Even if SPARK was working for me, using C (even a subset) has advantages: it's convenient to use in other projects, there exist verified compilers for it and there's an extension to CompCert that allows for static verification of constant-time behaviour (although I've lost the paper!)

So here's the same function in Frama-C:

/*@ requires \valid_range(out, 0, 9);
  @ requires \valid_range(x, 0, 9);
  @ requires \valid_range(y, 0, 9);
  @ requires \forall integer i; 0 <= i < 10 ==>
      (x[i] > -1000 && x[i] < 1000);
  @ requires \forall integer i; 0 <= i < 10 ==>
      (y[i] > -1000 && y[i] < 1000);
  @ ensures \forall integer i; 0 <= i < 10 ==> (out[i] == x[i] + y[i]);
void add(int *out, const int *x, const int *y) {
  /*@ loop invariant i >= 0 && i <= 10 &&
        (\forall integer j; 0 <= j < i ==> out[j] == x[j] + y[j]);
    @ loop variant 10-i;
  for (int i = 0; i < 10; i++) {
    out[i] = x[i] + y[i];

Note that this time we not only need a loop invariant to show the postcondition, but we also need a loop variant to show that the for loop terminates: you really need to spoon feed the verification! But Frama-C/Jessie has no problem with functions in the logic:

/*@ logic integer felemvalue (int *x) =
      x[0] + x[1] * (1 << 26) + x[2] * (1 << 51) + x[3] * (1 << 77) +
      x[4] * (1 << 102) + x[5] * 340282366920938463463374607431768211456 +
      x[6] * 11417981541647679048466287755595961091061972992 +
      x[7] * 766247770432944429179173513575154591809369561091801088 +
      x[8] * 25711008708143844408671393477458601640355247900524685364822016 +
      x[9] * 1725436586697640946858688965569256363112777243042596638790631055949824; */

That function maps from an array to the abstract integer that it represents. Note that the terms switched from using shifts to represent the constants to literal numbers. That's because the proof process suddenly became non-terminating (in a reasonable time) once the values reached about 104-bits, but the literals still worked.

With that logic function in hand, the we can easily prove the higher-level concept that we wanted as a post condition of the add function:

@ ensures felemvalue(out) == felemvalue(x) + felemvalue(y);

Flushed with success, I moved onto the next most basic function: multiplication of 255-bit integers. The code that Donna uses is the textbook, polynomial multiplication code. Here's how the function starts:

/* Multiply two numbers: output = in2 * in
 * output must be distinct to both inputs. The inputs are reduced coefficient
 * form, the output is not.
 * output[x] <= 14 * the largest product of the input limbs. */
static void fproduct(limb *output, const limb *in2, const limb *in) {
  output[0] =       ((limb) ((s32) in2[0])) * ((s32) in[0]);
  output[1] =       ((limb) ((s32) in2[0])) * ((s32) in[1]) +
                    ((limb) ((s32) in2[1])) * ((s32) in[0]);
  output[2] =  2 *  ((limb) ((s32) in2[1])) * ((s32) in[1]) +
                    ((limb) ((s32) in2[0])) * ((s32) in[2]) +
                    ((limb) ((s32) in2[2])) * ((s32) in[0]);

Each of the lines is casting a 64-bit number down to 32 bits and then doing a 32×32⇒64 multiplication. (The casts down to 32 bits are just to tell the compiler not to waste time on a 64×64⇒64 multiplication.) In Frama-C/Jessie we need to show that the casts are safe, multiplications don't overflow and nor do the sums and multiplications by two etc. This should be easy. Here are the preconditions that set the bounds on the input.

/*@ requires \valid_range(output, 0, 18);
  @ requires \valid_range(in, 0, 9);
  @ requires \valid_range(in2, 0, 9);
  @ requires \forall integer i; 0 <= i < 10 ==> -134217728 < in[i] < 134217728;
  @ requires \forall integer i; 0 <= i < 10 ==> -134217728 < in2[i] < 134217728;

However, Why3, which is the prover backend that Jessie (and SPARK) uses, quickly gets bogged down. In an attempt to help it out, I moved the casts to the beginning of the function and put the results in arrays so that the code was less complex.

As you can see in the Jessie documentation, the Why3 UI allows one to do transforms on the verification conditions to try and make life easier for the prover. Quickly, splitting a requirement becomes necessary. But this throws away a lot of information. In this case, each of the 32×32⇒64 multiplications is assumed to fit only within its bounds and so the intermediate bounds need to be reestablished every time. This means that the equations have to be split like this:

limb t, t2;
t = ((limb) in2_32[0]) * in_32[1];
//@ assert -18014398509481984 < t < 18014398509481984;
t2 = ((limb) in2_32[1]) * in_32[0];
//@ assert -18014398509481984 < t2 < 18014398509481984;
output[1] = t + t2;

That's only a minor problem really, but the further one goes into the function, the harder the prover needs to work for some reason. It's unclear why because every multiplication has the same bounds - they are all instances of the same proof. But it's very clear that the later multiplications are causing more work.

Why3 is an abstraction layer for SMT solvers and a large number are supported (see “External provers” on its homepage for a list). I tried quite a few and, for this task, Z3 is clearly the best with CVC4 coming in second. However, Z3 has a non-commercial license which is very worrying - does that mean that a verified version of Curve25519 that used Z3 has to also have a non-commercial license? So I stuck to using CVC4.

However, about a quarter of the way into the function, both CVC4 and Z3 are stuck. Despite the bounds being a trivial problem, and just instances of the same problem that they can solve at the beginning of the function, somehow either Why3 is doing something bad or the SMT solvers are failing to discard irrelevant facts. I left it running overnight and Z3 solved one more instance after six hours but achieved nothing after 12 hours on the next one. Splitting and inlining the problem further in the Why3 GUI didn't help either.

Like SPARK, Jessie can use Isabelle for proofs too (and for the same reason: they are both using Why3, which supports Isabelle as a backend). It even worked this time, once I added Real to the Isabelle imports. However, in the same way that the C parser in Isabelle was an ML function that created Isabelle objects internally, the Why3 connection to Isabelle is a function (why3_open) that parses an XML file. This means that the proof has large numbers of generated variable names (o33, o34, etc) and you have no idea which intermediate values they are. Additionally, the bounds problem is something that Isabelle could have solved automatically, but the assumptions that you can use are similarly autonamed as things like local.H165. In short, the Isabelle integration appears to be unworkable.

Perhaps I could split up each statement in the original code into a separate function and then write the statement again in the logic in order in order to show the higher level properties, but at some point you have to accept that you're not going to be able to dig this hole with a plastic spoon.


The conclusion is a bit disappointing really: Curve25519 has no side effects and performs no allocation, it's a pure function that should be highly amenable to verification and yet I've been unable to find anything that can get even 20 lines into it. Some of this might be my own stupidity, but I put a fair amount of work into trying to find something that worked.

There seems to be a lot of promise in the area and some pieces work well (SMT solvers are often quite impressive, the Frama-C framework appears to be solid, Isabelle is quite pleasant) but nothing I found worked well together, at least for verifying C code. That makes efforts like SeL4 and Ironsides even more impressive. However, if you're happy to work at a higher level I'm guessing that verifying functional programs is a lot easier going.

HSTS for new TLDs

Whatever you might think of them, the new TLDs are rapidly arriving. New TLDs approved so far this month include alsace, sarl, iinet, poker, gifts, restaurant, fashion, tui and allfinanz. The full list for last month is over twice as long.

That means that there's lots of people currently trying to figure out how to differentiate themselves from other TLDs. Here's an idea: why not ask me to set HSTS for the entire TLD? That way, every single site runs over HTTPS, always. It strikes me that could be useful if you're trying to build trust with users unfamiliar with the zoo of new domains.

(I can't speak for Firefox and Safari but I think it's safe to assume that Firefox would be on board with this. It's still unclear whether IE's support for HSTS will include preloading.)

I'm guessing that, with such a large number of new TLDs, I should be able to reach at least some operators of them via this blog post.

Encrypting streams

When sending data over the network, chunking is pretty much a given. TLS has a maximum record size of 16KB and this fits neatly with authenticated encryption APIs which all operate on an entire message at once.

But file encryption frequently gets this wrong. Take OpenPGP: it bulk encrypts the data and sticks a single MAC on the end. Ideally everyone is decrypting to a temporary file and waiting until the decryption and verification is complete before touching the plaintext, but it takes a few seconds of searching to find people suggesting commands like this:

gpg -d your_archive.tgz.gpg | tar xz

With that construction, tar receives unauthenticated input and will happily extract it to the filesystem. An attacker doesn't (we assume) know the secret key, but they can guess at the structure of the plaintext and flip bits in the ciphertext. Bit flips in the ciphertext will produce a corresponding bit flip in the plaintext, followed by randomising the next block. I bet some smart attacker can do useful things with that ability. Sure the gpg command will exit with an error code, but do you think that the shell script writer carefully handled that case and undid the changes to the filesystem?

The flaw here isn't in CFB mode's malleability, but in OpenPGP forcing the use of unauthenticated plaintext in practical situations. (Indeed, if you are ever thinking about the malleability of ciphertext, you have probably already lost.)

I will even claim that the existance of an API that can operate in a streaming fashion over large records (i.e. will encrypt and defer the authenticator and will decrypt and return unauthenticated plaintext) is a mistake. Not only is it too easy to misunderstand and misuse (like the gpg example above) but, even if correctly buffered in a particular implementation, the existance of large records may force other implementations to do dangerous things because of a lack of buffer space.

If large messages are chunked at 16KB then the overhead of sixteen bytes of authenticator for every chunk is only 0.1%. Additionally, you can safely stream the decryption (as long as you can cope with truncation of the plaintext).

Although safer in general, when chunking one has to worry that an attacker hasn't reordered chunks, hasn't dropped chunks from the start and hasn't dropped chunks from the end. But sadly there's not a standard construction for taking an AEAD and making a scheme suitable for encrypting large files (AERO might be close, but it's not quite what I have in mind). Ideally such a scheme would take an AEAD and produce something very like an AEAD in that it takes a key, nonce and additional data, but can safely work in a streaming fashion. I don't think it need be very complex: take 64 bits of the nonce from the underlying AEAD as the chunk number, always start with chunk number zero and feed the additional data into chunk zero with a zero byte prefix. Prefix each chunk ciphertext with a 16 bit length and set the MSB to indicate the last chunk and authenticate that indication by setting the additional data to a single, 1 byte. The major worry might be that for many underlying AEADs, taking 64 bits of the nonce for the chunk counter leaves one with very little (or none!) left.

That requires more thought before using it for real but, if you are ever building encryption-at-rest, please don't mess it up like we did 20 years ago. (Although, even with that better design, piping the output into tar may still be unwise because an attacker can truncate the stream at a chunk boundary: perhaps eliminating important files in the process.)

Update: On Twitter, zooko points to Tahoe-LAFS as an example of getting it right. Additionally, taking the MAC of the current state of a digest operation and continuing the operation has been proposed for sponge functions (like SHA-3) under the name MAC-and-continue. The exact provenance of this isn't clear, but I think it might have been from the Keccak team in this paper. Although MAC-and-continue doesn't allow random access, which might be important for some situations.


Earlier this year, before Apple had too many goto fails and GnuTLS had too few, before everyone learnt that TLS heart-beat messages were a thing and that some bugs are really old, I started a tidy up of the OpenSSL code that we use at Google.

We have used a number of patches on top of OpenSSL for many years. Some of them have been accepted into the main OpenSSL repository, but many of them don’t mesh with OpenSSL’s guarantee of API and ABI stability and many of them are a little too experimental.

But as Android, Chrome and other products have started to need some subset of these patches, things have grown very complex. The effort involved in keeping all these patches (and there are more than 70 at the moment) straight across multiple code bases is getting to be too much.

So we’re switching models to one where we import changes from OpenSSL rather than rebasing on top of them. The result of that will start to appear in the Chromium repository soon and, over time, we hope to use it in Android and internally too.

There are no guarantees of API or ABI stability with this code: we are not aiming to replace OpenSSL as an open-source project. We will still be sending them bug fixes when we find them and we will be importing changes from upstream. Also, we will still be funding the Core Infrastructure Initiative and the OpenBSD Foundation.

But we’ll also be more able to import changes from LibreSSL and they are welcome to take changes from us. We have already relicensed some of our prior contributions to OpenSSL under an ISC license at their request and completely new code that we write will also be so licensed.

(Note: the name is aspirational and not yet a promise.)

Early ChangeCipherSpec Attack

OpenSSL 1.0.1h (and others) were released today with a scary looking security advisiory and that's always an event worth looking into. (Hopefully people are practiced at updating OpenSSL now!)

Update: the original reporter has a blog post up. Also, I won't, personally, be answering questions about specific Google services. (I cut this blog post together from notes that I'm writing for internal groups to evaluate and this is still very fresh.)

Update: my initial thoughts from looking at the diff still seem to be holding up. Someone is welcome to write a more detailed analysis than below. HP/ZDI have a write up of one of the DTLS issues.

There are some critical bug fixes to DTLS (TLS over datagram transports, i.e. UDP), but most people will be more concerned about the MITM attack against TLS (CVE-2014-0224).

The code changes are around the rejection of ChangeCipherSpec messages, which are messages sent during the TLS handshake that mark the change from unencrypted to encrypted traffic. These messages aren't part of the handshake protocol itself and aren't linked into the handshake state machine in OpenSSL. Rather there's a check in the code that they are only received when a new cipher is ready to be used. However, that check (for s->s3->tmp.new_cipher in s3_pkt.c) seems reasonable, but new_cipher is actually set as soon as the cipher for the connection has been decided (i.e. once the ServerHello message has been sent/received), not when the cipher is actually ready! It looks like this is the problem that's getting fixed in this release.

Here's the code in question that handles a ChangeCipherSpec message:

int ssl3_do_change_cipher_spec(SSL *s)
	int i;
	const char *sender;
	int slen;

	if (s->state & SSL_ST_ACCEPT)

	if (s->s3->tmp.key_block == NULL)1
		if (s->session == NULL)
			/* might happen if dtls1_read_bytes() calls this */
			return (0);

		if (!s->method->ssl3_enc->setup_key_block(s)) return(0); 2

	if (!s->method->ssl3_enc->change_cipher_state(s,i))

	/* we have to record the message digest at
	 * this point so we can get it before we read
	 * the finished message */
	if (s->state & SSL_ST_CONNECT)

	i = s->method->ssl3_enc->final_finish_mac(s,
		sender,slen,s->s3->tmp.peer_finish_md); 3
	if (i == 0)
		return 0;
	s->s3->tmp.peer_finish_md_len = i;


If a ChangeCipherSpec message is injected into the connection after the ServerHello, but before the master secret has been generated, then ssl3_do_change_cipher_spec will generate the keys (2) and the expected Finished hash (3) for the handshake with an empty master secret. This means that both are based only on public information. Additionally, the keys will be latched because of the check at (1) - further ChangeCipherSpec messages will regenerate the expected Finished hash, but not the keys.

The oldest source code on the OpenSSL site is for 0.9.1c (Dec 28, 1998) and the affected code appears almost unchanged. So it looks like this bug has existed for 15+ years.

The implications of this are pretty complex.

For a client there's an additional check in the code that requires that a CCS message appear before the Finished and after the master secret has been generated. An attacker can still inject an early CCS too and the keys will be calculated with an empty master secret. Those keys will be latched - another CCS won't cause them to be recalculated. However, when sending the second CCS that the client code requires, the Finished hash is recalculated with the correct master secret. This means that the attacker can't fabricate an acceptable Finished hash. This stops the obvious, generic impersonation attack against the client.

For a server, there's no such check and it appears to be possible to send an early CCS message and then fabricate the Finished hash because it's based on an empty master secret. However, that doesn't obviously gain an attacker anything. It would be interesting if a connection with a client certificate could be hijacked, but there is a check in ssl3_get_cert_verify that a CCS hasn't been already processed so that isn't possible.

Someone may be able to do something more creative with this bug; I'm not ruling out that there might be other implications.

Things change with an OpenSSL 1.0.1 server however. In 1.0.1, a patch of mine was included that moves the point where Finished values are calculated. The server will now use the correct Finished hash even if it erroneously processed a CCS, but this interacts badly with this bug. The server code (unlike the client) won't accept two CCS messages in a handshake. So, if an attacker injects an early CCS at the server to fixate the bad keys, then it's not possible for them to send a second in order to get it to calculate the correct Finished hash. But with 1.0.1, the server will use the correct Finished hash and will reply with the correct hash to the client. This explains the 1.0.1 and 1.0.2 mention in the advisory. With any OpenSSL client talking to an OpenSSL 1.0.1 server, an attacker can inject CCS messages to fixate the bad keys at both ends but the Finished hashes will still line up. So it's possible for the attacker to decrypt and/or hijack the connection completely.

The good news is that these attacks need man-in-the-middle position against the victim and that non-OpenSSL clients (IE, Firefox, Chrome on Desktop and iOS, Safari etc) aren't affected. None the less, all OpenSSL users should be updating.

Update: I hacked up the Go TLS library to perform the broken handshake in both 0.9.8/1.0.0 and 1.0.1 modes. There's a patch for crypto/tls and a tool that will attempt to do those broken handshakes and tell you if either worked. For example (at the time of writing):

$ ./earlyccs_check google.com
Handshake failed with error: remote error: unexpected message
Looks ok.
$ ./earlyccs_check github.com
Server is affected (1.0.1).
$ ./earlyccs_check amazon.com
Server is affected (0.9.8 or 1.0.0).

Matching primitive strengths

It's a common, and traditional, habit to match the strengths of cryptographic primitives. For example, one might design at the 128-bit security level and pick AES-128, SHA-256 and P-256. Or if someone wants “better” security, one might target a 192-bit security level with AES-192, SHA-384 and P-384. NIST does this sort of thing, for example in 800-57/3, section 5.2.

The logic underlying this is that an attacker will attack at a system's weakest point, so anything that is stronger than the weakest part is a waste. But, while matching up the security level might have made sense in the past, I think it warrants reconsideration now.

Consider the energy needed to do any operation 2128 times: Landauer tells us that erasing a bit takes a minimum amount of energy any actual crypto function will involve erasing many bits. But, for the sake of argument, let's say that we're doing a single bit XOR. If we take the minimum energy at room temperature (to save energy on refrigeration) we get 2.85×10-21 × 2128 = 0.97×1018J. That's about “yearly electricity consumption of South Korea as of 2009” according to Wikipedia. If we consider that actually doing anything useful in each of those 2128 steps is going to take orders of magnitude more bit operations, and that we can't build a computer anywhere near the theoretic limit, we can easily reach, say, 5.5×1024J, which is “total energy from the Sun that strikes the face of the Earth each year”.

(The original version of this post discussed incrementing a 128-bit counter and said that it took two erasures per increment. Samuel Neves on Twitter pointed out that incrementing the counter doesn't involve destroying the information of the old value and several people pointed out that one can hit all the bit-patterns with a Gray code and thus only need to flip a single bit each time. So I changed the above to use a one-bit XOR as the basic argument still holds.)

So any primitive at or above the 128-bit security level is equally matched today, because they are all effectively infinitely strong.

One way to take this is to say that anything above 128 bits is a waste of time. Another is to say that one might want to use primitives above that level, but based on the hope that discontiguous advances leave it standing but not the lesser primitives. The latter involves a lot more guesswork than the old practice of making the security levels match up. Additionally, I don't think most people would think that an analytic result is equally likely for all classes of primitives, so one probably doesn't want to spend equal amounts of them all.

Consider the difference between P-256 and P-384 (or curve25519 and Curve41417 or Hamburg's Goldilocks-448). If they are all infinitely strong today then the reason to want to use a larger curve is because you expect some breakthrough, but not too much of a breakthrough. Discrete log in an elliptic-curve group is a square-root work function today. For those larger curves to make sense you have to worry about a breakthrough that can do it in cube-root time, but not really forth-root and certainly not fifth-root. How much would you pay to insure yourself against that, precise possibility? And how does that compare against the risk of a similar breakthrough in AES?

It is also worth considering whether the security-level is defined in a way matches your needs. AES-128 is generally agreed to have a 128-bit security level, but if an attacker is happy breaking any one of 2n keys then they can realise a significant speedup. An “unbalanced” choice of cipher key size might be reasonable if that attack is a concern.

So maybe a 256-bit cipher (ciphers do pretty well now, just worry about Grover's algorithm), SHA-384 (SHA-256 should be fine, but hash functions have, historically, had a bad time) and 128512(? see below)-bit McEliece is actually “balanced” these days.

(Thanks to Mike Hamburg and Trevor Perrin, with whom I was discussing this after Mike presented his Goldilocks curve. Thanks to JP Aumasson on Twitter and pbsd on HN for suggesting multi-target attacks as a motivation for large cipher key sizes. Also to pbsd for pointing out a quantum attack against McEliece that I wasn't aware of.)

SHA-256 certificates are coming

It's a neat result in cryptography that you can build a secure hash function given a secure signature scheme, and you can build a secure signature scheme given a secure hash function. However, far from the theory, in the real world, lots of signatures today depend on SHA-1, which is looking increasingly less like a secure hash function.

There are lots of properties by which one can evaluate a hash function, but the most important are preimage resistance (if I give you a hash, can you find an input with that hash value?), second preimage resistance (if I give you a message, can you find another that hashes to the same value?) and collision resistance (can you find two messages with the same hash?). Of those, the third appears to be much harder to meet than the first two, based on historical results against hash functions.

Back when certificates were signed with MD5, a chosen-prefix collision attack (i.e. given two messages, can you append different data to each so that the results have the same hash?) against MD5 was used at least twice to break the security of the PKI. First the MD5 Collisions Inc demo against RapidSSL and then the Flame malware.

Today, SHA-1 is at the point where a collision attack is estimated to take 261 work and a chosen-prefix collision to take 277. Both are below the design strength of 280 and even that design strength isn't great by today's standards.

We hope that we have a second line of defense for SHA-1: after the MD5 Collisions Inc demo, CAs were required to use random serial numbers. A chosen-prefix attack requires being able to predict the certificate contents that will be signed and the random serial number should thwart that. With random serials we should be resting on a stronger hash function property called target-collision resistance. (Although I'm not aware of any proofs that random serials actually put us on TCR.)

Still, it would be better not to depend on those band-aids and to use a hash function with a better design strength while we do it. So certificates are starting to switch to using SHA-256. A large part of that shift came from Microsoft forbidding certificates using SHA-1 starting in 2016.

For most people, this will have no effect. Twitter ended up with a SHA-256 certificate after they replaced their old one because of the OpenSSL heartbeat bug. So, if you can still load Twitter's site, then you're fine.

But there are a lot of people using Windows XP prior to Service Pack 3, and they will have problems. We've already seen lots of user reports of issues with Twitter (and other sites) from these users. Wherever possible, installing SP3 is the answer. (Or, better yet, updating from Windows XP.)

There are also likely to be problems with embedded clients, old phones etc. Some of these may not come to light for a while.

We've not yet decided what Google's timeline for switching is, but we will be switching prior to 2016. If you've involved with a something embedded that speaks to Google over HTTPS (and there's a lot of it), now is the time to test https://cert-test.sandbox.google.com, which is using a SHA-256 certificate. I'll be using other channels to contact the groups that we know about but, on the web, we don't always know what's talking to us.

Revocation still doesn't work

I was hoping to be done with revocation for a bit, but sadly not.

GRC have published a breathless piece attacking a straw man argument: “Google tells us ... that Chrome's unique CRLSet solution provides all the protection we need”. I call it a straw man because I need only quote my own posts that GRC have read to show it.

The original CRLSets announcement contained two points. Firstly that online revocation checking doesn't work and that we were switching it off in Chrome in most cases. Secondly, that we would be using CRLSets to avoid having to do full binary updates in order to respond to emergency incidents.

In the last two paragraphs I mentioned something else. Quoting myself:

“Since we're pushing a list of revoked certificates anyway, we would like to invite CAs to contribute their revoked certificates (CRLs) to the list. We have to be mindful of size, but the vast majority of revocations happen for purely administrative reasons and can be excluded. So, if we can get the details of the more important revocations, we can improve user security. Our criteria for including revocations are:

  1. The CRL must be crawlable: we must be able to fetch it over HTTP and robots.txt must not exclude GoogleBot.
  2. The CRL must be valid by RFC 5280 and none of the serial numbers may be negative.
  3. CRLs that cover EV certificates are taken in preference, while still considering point (4).
  4. CRLs that include revocation reasons can be filtered to take less space and are preferred.”

In short, since we were pushing revocations anyway, maybe we could get some extra benefit from it. It would be better than nothing, which is what browsers otherwise have with soft-fail revocation checking.

I mentioned it again, last week (emphasis added):

“We compile daily lists of some high-value revocations and use Chrome's auto-update mechanism to push them to Chrome installations. It's called the CRLSet and it's not complete, nor big enough to cope with large numbers of revocations, but it allows us to react quickly to situations like Diginotar and ANSSI. It's certainly not perfect, but it's more than many other browsers do.”

“The original hope with CRLSets was that we could get revocations categorised into important and administrative and push only the important ones. (Administrative revocations occur when a certificate is changed with no reason to suspect that any key compromise occurred.) Sadly, that mostly hasn't happened.”

And yet, GRC managed to write pages (including cartoons!) exposing the fact that it doesn't cover many revocations and attacking Chrome for it.

They also claim that soft-fail revocation checking is effective:

The claim is that a user will download and cache a CRL while not being attacked and then be protected from a local attacker using a certificate that was included on that CRL. (I'm paraphrasing; you can search for “A typical Internet user” in their article for the original.)

There are two protocols used in online revocation checking: OCSP and CRL. The problem is that OCSP only covers a single certificate and OCSP is used in preference because it's so much smaller and thus removes the need to download CRLs. So the user isn't expected to download and cache the CRL anyway. So that doesn't work.

However, it's clear that someone is downloading CRLs because Cloudflare are spending half a million dollars a month to serve CRLs. Possibly it's non-browser clients but the bulk is probably CAPI (the library that IE, Chrome and other software on Windows typically uses - although not Firefox). A very obscure fact about CAPI is that it will download a CRL when it has 50 or more OCSP responses cached for a given CA certificate. But CAs know this and split their CRLs so that they don't get hit with huge bandwidth bills. But a split CRL renders the claimed protection from caching ineffective, because the old certificate for a given site is probably in a different part.

So I think the claim is that doing blocking OCSP lookups is a good idea because, if you use CAPI on Windows, then you might cache 50 OCSP responses for a given CA certificate. Then you'll download and cache a CRL for a while and then, depending on whether the CA splits their CRLs, you might have some revocations cached for a site that you visit.

It seems that argument is actually for something like CRLSets (download and cache revocations) over online checking, it's just a Rube Goldberg machine to, perhaps, implement it!

So, once again, revocation doesn't work. It doesn't work in other browsers and CRLSets aren't going to cover much in Chrome, as I've always said. GRC's conclusions follow from those errors and end up predictably far from the mark.

In order to make this post a little less depressing, let's consider whether its reasonable to aim to cover everything with CRLSets. (Which I mentioned before as well, but I'll omit the quote.) GRC quote numbers from Netcraft claiming 2.85 million revocations, although some are from certificate authorities not trusted by mainstream web browsers. I spent a few minutes harvesting CRLs from the CT log. This only includes certificates that are trusted by reasonable number of people and I only waited for half the log to download. From that half I threw in the CRLs included by CRLSets and got 2356 URLs after discarding LDAP ones.

I tried downloading them and got 2164 files. I parsed them and eliminated duplicates and my quick-and-dirty parser skipped quite a few. None the less, this very casual search found 1,062 issuers and 4.2 million revocations. If they were all dropped into the CRLSet, it would take 39MB.

So that's a ballpark figure, but we need to design for something that will last a few years. I didn't find figures on the growth of HTTPS sites, but Netcraft say that the number of web sites overall is growing at 37% a year. It seems reasonable that the number of HTTPS sites, thus certificates, thus revocations will double a couple of times in the next five years.

There's also the fact that if we subsidise revocation with the bandwidth and computers of users, then demand for revocation will likely increase. Symantec, I believe, have experimented with certificates that are issued for the maximum time (five years at the moment) but sold in single year increments. When it comes to the end of the year, they'll remind you to renew. If you do, you don't need to do anything, otherwise the certificate gets revoked. This seems like a good deal for the CA so, if we make revocation really effective, I'd expect to see a lot more of it. Perhaps factor in another doubling for that.

So we would need to scale to ~35M revocations in five years, which is ~300MB of raw data. That would need to be pushed to all users and delta updated. (I don't have numbers for the daily delta size.)

That would obviously take a lot of bandwidth and a lot of resources on the client. We would really need to aim at mobile as well, because that is likely to be the dominant HTTPS client in a few years, making storage and bandwidth use even more expensive.

Some probabilistic data structure might help. I've written about that before. Then you have to worry about false positives and the number of reads needed per validation. Phones might use Flash for storage, but it's remarkably slow.

Also, would you rather spend those resources on revocation, or on distributing the Safe Browsing set to mobile devices? Or would you spend the memory on improving ASLR on mobile devices? Security is big picture full of trade-offs. It's not clear the revocation warrants all that spending.

So, if you believe that downloading and caching revocation is the way forward, I think those are the parameters that you have to deal with.

No, don't enable revocation checking

Revocation checking is in the news again because of a large number of revocations resulting from precautionary rotations for servers affected by the OpenSSL heartbeat bug. However, revocation checking is a complex topic and there's a fair amount of misinformation around. In short, it doesn't work and you are no more secure by switching it on. But let's quickly catch up on the background:

Certificates bind a public key and an identity (commonly a DNS name) together. Because of the way the incentives work out, they are typically issued for a period of several years. But events occur and sometimes the binding between public key and name that the certificate asserts becomes invalid during that time. In the recent cases, people who ran a server that was affected by the heartbeat bug are worried that their private key might have been obtained by someone else and so they want to invalidate the old binding, and bind to a new public key. However, the old certificates are still valid and so someone who had obtained that private key could still use them.

Revocation is the process of invalidating a certificate before its expiry date. All certificates include a statement that essentially says “please phone the following number to check that this certificate has not been revoked”. The term online revocation checking refers to the process of making that phone call. It's not actually a phone call, of course, rather browsers and other clients can use a protocol called OCSP to check the status of a certificate. OCSP supplies a signed statement that says that the certificate is still valid (or not) and, critically, the OCSP statement itself is valid for a much shorter period of time, typically a few days.

The critical question is what to do in the event that you can't get an answer about a certificate's revocation status. If you reject certificates when you can't get an answer, that's called hard-fail. If you accept certificates when you can't get an answer that's called soft-fail.

Everyone does soft-fail for a number of reasons on top of the general principle that single points of failure should be avoided. Firstly, the Internet is a noisy place and sometimes you can't get through to OCSP servers for some reason. If you fail in those cases then the level of random errors increases. Also, captive portals (e.g. hotel WiFi networks where you have to “login” before you can use the Internet) frequently use HTTPS (and thus require certificates) but don't allow you to access OCSP servers. Lastly, if everyone did hard-fail then taking down an OCSP service would be sufficient to take down lots of Internet sites. That would mean that DDoS attackers would turn their attention to them, greatly increasing the costs of running them and it's unclear whether the CAs (who pay those costs) could afford it. (And the disruption is likely to be significant while they figure that out.)

So soft-fail is the only viable answer but it has a problem: it's completely useless. But that's not immediately obvious so we have to consider a few cases:

If you're worried about an attacker using a revoked certificate then the attacker first must be able to intercept your traffic to the site in question. (If they can't even intercept the traffic then you didn't need any authentication to protect it from them in the first place.) Most of the time, such an attacker is near you. For example, they might be running a fake WiFi access point, or maybe they're at an ISP. In these cases the important fact is that the attacker can intercept all your traffic, including OCSP traffic. Thus they can block OCSP lookups and soft-fail behaviour means that a revoked certificate will be accepted.

The next class of attacker might be a state-level attacker. For example, Syria trying to intercept Facebook connections. These attackers might not be physically close, but they can still intercept all your traffic because they control the cables going into and out of a country. Thus, the same reasoning applies.

We're left with cases where the attacker can only intercept traffic between a user and a website, but not between the user and the OCSP service. The attacker could be close to the website's servers and thus able to intercept all traffic to that site, but not anything else. More likely, the attacker might be able to perform a DNS hijacking: they persuade a DNS registrar to change the mapping between a domain (example.com) and its IP address(es) and thus direct the site's traffic to themselves. In these cases, soft-fail still doesn't work, although the reasoning is more complex:

Firstly, the attacker can use OCSP stapling to include the OCSP response with the revoked certificate. Because OCSP responses are generally valid for some number of days, they can store one from before the certificate was revoked and use it for as long as it's valid for. DNS hijackings are generally noticed and corrected faster than the OCSP response will expire. (On top of that, you need to worry about browser cache poisoning, but I'm not going to get into that.)

Secondly, and more fundamentally, when issuing certificates a CA validates ownership of a domain by sending an email, or looking for a specially formed page on the site. If the attacker is controlling the site, they can get new certificates issued. The original owners might revoke the certificates that they know about, but it doesn't matter because the attacker is using different ones. The true owners could try contacting CAs, convincing them that they are the true owners and get other certificates revoked, but if the attacker still has control of the site, they can hop from CA to CA getting certificates. (And they will have the full OCSP validity period to use after each revocation.) That circus could go on for weeks and weeks.

That's why I claim that online revocation checking is useless - because it doesn't stop attacks. Turning it on does nothing but slow things down. You can tell when something is security theater because you need some absurdly specific situation in order for it to be useful.

So, for a couple of years now, Chrome hasn't done these useless checks by default in most cases. Rather, we have tried a different mechanism. We compile daily lists of some high-value revocations and use Chrome's auto-update mechanism to push them to Chrome installations. It's called the CRLSet and it's not complete, nor big enough to cope with large numbers of revocations, but it allows us to react quickly to situations like Diginotar and ANSSI. It's certainly not perfect, but it's more than many other browsers do.

A powerful attacker may be able to block a user from receiving CRLSet updates if they can intercept all of that user's traffic for long periods of time. But that's a pretty fundamental limit; we can only respond to any Chrome issue, including security bugs, by pushing updates.

The original hope with CRLSets was that we could get revocations categorised into important and administrative and push only the important ones. (Administrative revocations occur when a certificate is changed with no reason to suspect that any key compromise occurred.) Sadly, that mostly hasn't happened. Perhaps we need to consider a system that can handle much larger numbers of revocations, but the data in that case is likely to be two orders of magnitude larger and it's very unlikely that the current CRLSet design is still optimal when the goal moves that far. It's also a lot of data for every user to be downloading and perhaps efforts would be better spent elsewhere. It's still the case that an attacker that can intercept traffic can easily perform an SSL Stripping attack on most sites; they hardly need to fight revoked certificates.

In order to end on a positive note, I'll mention a case where online revocation checking does work, and another, possible way to solve the revocation problem for browsers.

The arguments above started with the point that an attacker using a revoked certificate first needs to be able to intercept traffic between the victim and the site. That's true for browsers, but it's not true for code-signing. In the case where you're checking the signature on a program or document that could be distributed via, say, email, then soft-fail is still valuable. That's because it increases the costs on the attacker substantially: they need to go from being able to email you to needing to be able to intercept OCSP checks. In these cases, online revocation checking makes sense.

If we want a scalable solution to the revocation problem then it's probably going to come in the form of short-lived certificates or something like OCSP Must Staple. Recall that the original problem stems from the fact that certificates are valid for years. If they were only valid for days then revocation would take care of itself. (This is the approach taken by DNSSEC.) For complex reasons, it might be easier to deploy that by having certificates that are valid for years, but include a marker in them that indicates that an OCSP response must be provided along with the certificate. The OCSP response is then only valid for a few days and the effect is the same (although less efficient).

TLS Triple Handshakes

Today, the TLS WG mailing list meeting received a note about the work of Karthikeyan Bhargavan, Antoine Delignat-Lavaud, Cedric Fournet, Alfredo Pironti and Pierre-Yves Strub on triple handshake attacks against TLS. This is more complex than just a duplicated goto and I'm not going to try and reproduce their explanation here. Instead, I'll link to their site again, which also includes a copy of their paper.

In short, the TLS handshake hashes in too little information, and always has. Because of that it's possible to synchronise the state of two TLS sessions in a way that breaks assumptions made in the rest of the protocol.

I'd like to thank the researchers for doing a very good job of disclosing this. The note today even included a draft for fixing the TLS key derivation to include all the needed information to stop this attack, and it'll be presented at the WG meeting tomorrow.

In the mean time, people shouldn't panic. The impact of this attack is limited to sites that use TLS client-certificate authentication with renegotiation, and protocols that depend on channel binding. The vast majority of users have never used client certificates.

The client-certificate issues can be fixed with a unilateral, client change to be stricter about verifying certificates during a renegotiation, as suggested by the authors. I've included an image, below, that is loaded over an HTTPS connection that renegotiates with unrelated certificates before returning the image data. Hopefully the image below is broken. If not, then it likely will be soon because of a browser update. (I took the server down.)

Protocols that depend on channel binding (including ChannelID) need other changes. Ideally, the proposed update to the master-secret computation will be finalised and implemented. (For ChannelID, we have already updated the protocol to include a change similar to the proposed draft.)

It's likely that there are still concrete problems to be found because of the channel-binding break. Hopefully with today's greater publicity people can start to find them.

TLS Symmetric Crypto

At this time last year, the TLS world was mostly running on RC4-SHA and AES-CBC. The Lucky 13 attack against CBC in TLS had just been published and I had spent most of January writing patches for OpenSSL and NSS to implement constant-time CBC decoding. The RC4 biases paper is still a couple of week away, but it's already clear that both these major TLS cipher suite families are finished and need replacing. (The question of which is worse is complicated.)

Here's Chrome's view of TLS by number of TLS connections from around this time (very minor cipher suite families were omitted):

Cipher suite familyPercentage of TLS connections made by Chrome

A whole bunch of people, myself included, have been working to address that since.

AES-GCM was already standardised, implemented in OpenSSL and has hardware support in Intel chips, so Wan-Teh Chang and I started to implement it in NSS, although first we had to implement TLS 1.2, on which it depends. That went out in Chrome 30 and 31. Brian Smith (and probably others!) at Mozilla shipped that code in Firefox also, with Firefox 27.

AES-GCM is great if you have hardware support for it: Haswell chips can do it in just about 1 cycle/byte. However, it's very much a hardware orientated algorithm and it's very difficult to implement securely in software with reasonable speed. Also, since TLS has all the infrastructure for cipher suite negotiation already, it's nice to have a backup in the wings should it be needed.

So we implemented ChaCha20-Poly1305 for TLS in NSS and OpenSSL. (Thanks to Elie Bursztein for doing a first pass in OpenSSL.) This is an AEAD made up of primitives from djb and we're using implementations from djb, Andrew M, Ted Krovetz and Peter Schwabe. Although it can't beat AES-GCM when AES-GCM has dedicated hardware, I suspect there will be lots of chips for a long time that don't have such hardware. Certainly most mobile phones at the moment don't (although AArch64 is coming).

Here's an example of the difference in speeds:

ChipAES-128-GCM speedChaCha20-Poly1305 speed
OMAP 446024.1 MB/s75.3 MB/s
Snapdragon S4 Pro41.5 MB/s130.9 MB/s
Sandy Bridge Xeon (AESNI)900 MB/s500 MB/s

(The AES-128-GCM implementation is from OpenSSL 1.0.1f. Note, ChaCha20 is a 256-bit cipher and AES-128 obviously isn't.)

There's also an annoying niggle with AES-GCM in TLS because the spec says that records have an eight byte, explicit nonce. Being an AEAD, the nonce is required to be unique for a given key. Since an eight-byte value is too small to pick at random with a sufficiently low collision probability, the only safe implementation is a counter. But TLS already has a suitable, implicit record sequence counter that has always been used to number records. So the explicit nonce is at best a waste of eight bytes per record, and possibly dangerous should anyone attempt to use random values with it. Thankfully, all the major implementations use a counter and I did a scan of the Alexa, top 200K sites to check that none are using random values - and none are. (Thanks to Dr Henson for pointing out that OpenSSL is using a counter, but with a random starting value.)

For ChaCha20-Poly1305 we can save 8 bytes of overhead per record by using the implicit sequence number for the nonce.

Cipher suite selection

Given the relative merits of AES-GCM and ChaCha20-Poly1305, we wish to use the former when hardware support is available and the latter otherwise. But TLS clients typically advertise a fixed preference list of cipher suites and TLS servers either respect it, or override the client's preferences with their own, fixed list. (E.g. Apache's SSLHonorCipherOrder directive.)

So, first, the client needs to alter its cipher suite preference list depending on whether it has AES-GCM hardware support - which Chrome now does. Second, servers that enforce their own preferences (which most large sites do) need a new concept of an equal-preference group: a set of cipher suites in the server's preference order which are all “equally good”. When choosing a cipher suite using the server preferences, the server finds its most preferable cipher suite that the client also supports and, if that is in an equal preference group, picks whichever member of the group is the client's most preferable. For example, Google servers have a cipher suite preference that includes AES-GCM and ChaCha20-Poly1305 cipher suites in an equal preference group at the top of the preference list. So if the client supports any cipher suite in that group, then the server will pick whichever was most preferable for the client.

The end result is that Chrome talking to Google uses AES-GCM if there's hardware support at the client and ChaCha20-Poly1305 otherwise.

After a year of work, let's see what Chrome's view of TLS is now:

Cipher suite familyPercentage of TLS connections made by Chrome
AES128-GCM & ChaCha20-Poly130539.9%

What remains is standardisation work and getting the code out to the world for others. The ChaCha20-Poly1305 cipher suites are unlikely to survive the IETF without changes so we'll need to respin them at some point. As for getting the code out, I hope to have something to say about that in the coming months. Before then I don't recommend that anyone try to repurpose code from the Chromium repos - it is maintained with only Chromium in mind.


Now that TLS's symmetric crypto is working better there's another, long-standing problem that needs to be addressed: TLS fallbacks. The new, AEAD cipher suites only work with TLS 1.2, which shouldn't be a problem because TLS has a secure negotiation mechanism. Sadly, browsers have a long history of doing TLS fallbacks: reconnecting with a lesser TLS version when a handshake fails. This is because there are lots of buggy HTTPS servers on the Internet that don't implement version negotiation correctly and fallback allows them to continue to work. (It also promotes buggy implementations by making them appear to function; but fallback was common before Chrome even existed so we didn't really have a choice.)

Chrome now has a four stage fallback: TLS 1.2, 1.1, 1.0 then SSLv3. The problem is that fallback can be triggered by an attacker injecting TCP packets and it reduces the security level of the connection. In the TLS 1.2 to 1.1 fallback, AES-GCM and ChaCha20-Poly1305 are lost. In the TLS 1.0 to SSLv3 fallback, all elliptic curve support, and thus ECDHE, disappears. If we're going to depend on these new cipher suites then we cannot allow attackers to do this, but we also cannot break all the buggy servers.

So, with Chrome 33, we implemented a fallback SCSV. This involves adding a pseudo-cipher suite in the client's handshake that indicates when the client is trying a fallback connection. If an updated server sees this pseudo-cipher suite, and the connection is not using the highest TLS version supported by the server, then it replies with an error. This is essentially duplicating the version negotiation in the cipher suite list, which isn't pretty, but the reality is that the cipher suite list still works but the primary version negotiation is rusted over with bugs.

With this in place (and also implemented on Google servers) we can actually depend on having TLS 1.2, at least between Chrome and Google (for now).


Few rollouts are trouble free and Chrome 33 certainly wasn't, although most of the problems were because of a completely different change.

Firstly, ChaCha20-Poly1305 is running at reduced speed on ARM because early-revision, S3 phones have a problem that causes the NEON, Poly1305 code to sometimes calculate the wrong result. (It's not clear if it's the MSM8960 chip or the power supply in the phone.) Of course, it works well enough to run the unit tests successfully because otherwise I wouldn't have needed two solid days to pin it down. Future releases will need to run a self-test to gather data about which revisions have the problem and then we can selectively disable the fast-path code on those devices. Until then, it's disabled everywhere. (Even so, ChaCha20-Poly1305 is still much faster than AES-GCM on ARM.)

But it was the F5 workaround that caused most of the headaches. Older (but common) firmware revisions of F5 devices hang the connection when the ClientHello is longer than 255 bytes. This appeared to be a major issue because it meant that we couldn't add things to the ClientHello without breaking lots of big sites that use F5 devices. After a plea and a discussion on the TLS list, an engineer from F5 explained that it was just handshake messages between 256 and 511 bytes in length that caused the issue. That suggested an obvious solution: pad the handshake message so that it doesn't fall into the troublesome size range.

Chrome 33 padded all ClientHellos to 512 bytes, which broke at least two “anti-virus” products that perform local MITM of TLS connections on Windows and at least one “filtering” device that doesn't do MITM, but killed the TCP connections anyway. All made the assumption that the ClientHello will never be longer than some tiny amount. In the firewall case, the fallback SCSV stopped a fallback to SSLv3 that would otherwise have hidden the problem by removing the padding.

Thankfully, two of the three vendors have acted quickly to address the issue. The last, Kaspersky, has known about these sorts of problems for 18 months now (since Chrome tried to deploy TLS 1.1 and hit a similar issue) but, thankfully, doesn't enable MITM mode by default.

Overall, it looks like the changes are viable. Hopefully the path will now be easier for any other clients that wish to do the same.

Apple's SSL/TLS bug

Yesterday, Apple pushed a rather spooky security update for iOS that suggested that something was horribly wrong with SSL/TLS in iOS but gave no details. Since the answer is at the top of the Hacker News thread, I guess the cat's out of the bag already and we're into the misinformation-quashing stage now.

So here's the Apple bug:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
	OSStatus        err;

	if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
		goto fail;
	if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
		goto fail;
		goto fail;
	if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
		goto fail;

	return err;

(Quoted from Apple's published source code.)

Note the two goto fail lines in a row. The first one is correctly bound to the if statement but the second, despite the indentation, isn't conditional at all. The code will always jump to the end from that second goto, err will contain a successful value because the SHA1 update operation was successful and so the signature verification will never fail.

This signature verification is checking the signature in a ServerKeyExchange message. This is used in DHE and ECDHE ciphersuites to communicate the ephemeral key for the connection. The server is saying “here's the ephemeral key and here's a signature, from my certificate, so you know that it's from me”. Now, if the link between the ephemeral key and the certificate chain is broken, then everything falls apart. It's possible to send a correct certificate chain to the client, but sign the handshake with the wrong private key, or not sign it at all! There's no proof that the server possesses the private key matching the public key in its certificate.

Since this is in SecureTransport, it affects iOS from some point prior to 7.0.6 (I confirmed on 7.0.4) and also OS X prior to 10.9.2 (confirmed on 10.9.1). It affects anything that uses SecureTransport, which is most software on those platforms although not Chrome and Firefox, which both use NSS for SSL/TLS. However, that doesn't mean very much if, say, the software update systems on your machine might be using SecureTransport.

I coded up a very quick test site at https://www.imperialviolet.org:1266. Note the port number (which is the CVE number), the normal site is running on port 443 and that is expected to work. On port 1266 the server is sending the same certificates but signing with a completely different key. If you can load an HTTPS site on port 1266 then you have this bug.

Because the certificate chain is correct and it's the link from the handshake to that chain which is broken, I don't believe any sort of certificate pinning would have stopped this. Also, this doesn't only affect sites using DHE or ECDHE ciphersuites - the attacker gets to choose the ciphersuite in this case and will choose the one that works for them.

Also, this doesn't affect TLS 1.2 because there's a different function for verifying the different ServerKeyExchange message in TLS 1.2. But, again, the attacker can choose any version that the client will accept. But if the client only enables TLS 1.2 then it appears that would workaround this issue. Likewise, if the client only enabled the plain, RSA ciphersuites then there's no ServerKeyExchange and that should also work around this issue. (Of the two, the former workaround is much more preferable.)

Based on my test site, both iOS 7.0.6 and OS X 10.9.2 fix the issue. (Update: it looks like the bug was introduced in 10.9 for OS X but existed in at least some versions of iOS 6. iOS 6.1.6 was released yesterday to fix it.)

This sort of subtle bug deep in the code is a nightmare. I believe that it's just a mistake and I feel very bad for whoever might have slipped in an editor and created it.

Here's a stripped down that code with the same issue:

extern int f();

int g() {
	int ret = 1;

	goto out;
	ret = f();

	return ret;

If I compile with -Wall (enable all warnings), neither GCC 4.8.2 or Clang 3.3 from Xcode make a peep about the dead code. That's surprising to me. A better warning could have stopped this but perhaps the false positive rate is too high over real codebases? (Thanks to Peter Nelson for pointing out the Clang does have -Wunreachable-code to warn about this, but it's not in -Wall.)

Maybe the coding style contributed to this by allowing ifs without braces, but one can have incorrect indentation with braces too, so that doesn't seem terribly convincing to me.

A test case could have caught this, but it's difficult because it's so deep into the handshake. One needs to write a completely separate TLS stack, with lots of options for sending invalid handshakes. In Chromium we have a patched version of TLSLite to do this sort of thing but I cannot recall that we have a test case for exactly this. (Sounds like I know what my Monday morning involves if not.)

Code review can be effective against these sorts of bug. Not just auditing, but review of each change as it goes in. I've no idea what the code review culture is like at Apple but I strongly believe that my colleagues, Wan-Teh or Ryan Sleevi, would have caught it had I slipped up like this. Although not everyone can be blessed with folks like them.

Lastly, there was a lot of discussion yesterday that Apple missed checking the hostname in the certificate. It's true that curl on the OS X command line oddly accepts HTTPS connections to IP addresses when the IP address isn't in the certificate, but I can't find that there's anything more than that and Safari doesn't have that problem.

Implementing Elligator for Curve25519

There are some situations where you would like to encode a point on an elliptic curve in such a way that it appears to be a uniform, random string. This comes up in some password authenticated key exchange systems, where one doesn't want to leak any bits of the password. It also comes up in protocols that are trying to be completely uniform so as to make life hard for network censors.

For the purposes of this post, I'm only going to be considering Curve25519. Recall that Curve25519 is the set of points (u,v) where v2 = u3 + 486662x2 + u and u and v are taken from GF(q=2255-19). Curve25519 works with a Montgomery ladder construction and thus only exchanges v coordinates. So, when sending a v coordinate there are two ways that an attacker can distinguish it from a random string:

Firstly, since the field is only 255 bits, the 256th bit is always zero. Thus if an attacker sees a series of 32-byte strings where the top bit of the last byte is always zero, then they can be confident that they are not random strings. This is easy to fix however, just XOR in a random bit and mask it out before processing.

Secondly, the attacker can assume that a 32-byte string is a v coordinate and check whether v3 + 486662x2 + v is a square. This will always be true if the strings are v coordinates, by the curve equation, but will only be true 50% of the time otherwise. This problem is a lot harder to fix.

Thankfully; djb, Tanja Lange, Mike Hamburg and Anna Krasnova have published Elligator [paper]. In that paper they present two maps from elliptic curve points to uniform, random strings and you should, at least, read the introduction to the paper, which contains much more detail on the use cases and previous work.

Elligator 2 is suitable for Curve25519 and there are some hints about an implementation in section 5.5. However, this blog post contains my own notes about implementing each direction of the map with only a couple of exponentiations. You may need to read the paper first before the following makes much sense.

I'll start with the inverse map, because that's what's used during key-generation. Throughout I'll write (x, y) for affine coordinates on the twisted, Edwards isomorphism of Curve25519, as defined for Ed25519; (u,v) for affine coordinates on Curve25519 itself; and X, Y, Z for extended coordinates on the twisted, Edwards curve.

The inverse map, ψ-1, takes a point on the curve with the following limitations and produces a uniform, representative value:

  1. u ≠ -A. (The value A is a parameter of Curve25519 and has the value 486662.)
  2. -2u(u + A) is a square

Since we're generating the points randomly, I'm going to ignore the first condition because it happens far less frequently than malfunctions in the CPU instructions that I might use to detect it.

The second condition excludes about half the points on the curve: these points don't have any representation as a uniform string. When generating a key we need to detect and restart if we happen upon one of those points. (When working in the other direction, ψ(r) = ψ(-r), which explains why half the image is missing.)

If the point is in the image of the map then the uniform string representation, r, is defined by:

(If you can't see the equation, you need a browser that can handle inline SVG.)

So our key generation function is going to take a private key and either produce nothing (if the resulting point isn't in the image of the map), or produce a public key and the uniform representation of it. We're working with Curve25519, so the public key, if any, will exactly match the result of the scalar_base_mult operation on Curve25519.

Since I happened to have the ref10 code from Ed25519 close to hand, we'll be calculating the scalar multiplication on the twisted, Edwards isomorphism of Curve25519 that Ed25519 uses. I've discussed that before but, in short, it allows for the use of precomputed tables and so is faster than scalar_base_mult with the Curve25519 code.

The result of the scalar multiplication of the base point with the private key is a point on the twisted, Edwards curve in extended coordinates: (X : Y : Z), where x = X/Z and y = Y/Z and where (x, y) are the affine coordinates on the Edwards curve. In order to convert a point on the Edwards curve to a point on Curve25519, we evaluate:

This is looking quite expensive: first we have to convert from extended coordinates on the Edwards curve to affine, then to affine on Curve25519. In finite fields, inversions (i.e. division) are very expensive because they are done by raising the divisor to the power of q-2 (where q=2255-19). Compared to such large exponentiations, multiplications and additions are almost free.

But, we can merge all those inversions and do the whole conversion with just one. First, substitute the extended coordinate conversion: x = X/Z, y = Y/Z.

By calculating just 1/(X(Y-Z)), we can get (u,v). The reciprocal can be used directly to calculate v and then multiplied by X to calculate u.

What remains is to check whether -2u(u + A) is a square and then to calculate the correct square root. Happily, we can calculate all that (including both square roots because we want to be constant-time), with a single exponentiation.

Square roots are defined in the standard way for finite fields where q ≅ 5 mod 8:

Let's consider the square roots first. Both have a common, constant term of (-1/2) which we can ignore for now. With that gone, both are fractions that we need to raise to (q+3)/8. Section 5 of the Ed25519 paper contains a trick for merging the inversion with the exponentiation:

But let's see what else we can do with the c value.

So c7 gives us the variable part of the other square root without another large exponentiation! It only remains to multiply in the constant term of (-1/2)(q+3)/8 to each and then, for each, test whether the square root of -1 needs to be multiplied. For the first square root, for example, we should have that r2 = -u/(2(u+A)) thus 2r2(u+A)+u should be zero. If not, then we need to multiply r by sqrt(-1).

Lastly, we need to check that the point is in the image of the map at all! (An implementation may want to do this sooner so that failures can be rejected faster.) For this we need to check that -2u(u + A) is a square. We use the fact that an element of the field is a non-zero square iff raising it to (q-1)/2 results in one. Thus we need to calculate (-2)(q-1)/2(u(u+A))(q-1)/2. The first part is just a constant, but the latter would be an expensive, large exponentiation, if not for c again:

With that result we can pick the correct square root (and in constant time, of course!)

The map in the forward direction

In the forward direction, the map takes a uniform, representative value and produces a public key, which is the x coordinate of a Curve25519 point in this context: ψ(r) = u. (Remember that the affine coordinates on Curve25519 are written (u,v) here.) This is much simpler than the reverse direction, although, as best I can tell, equally expensive. Given a value, r, one can calculate the value u with:

This requires one inversion to calculate d and another, large exponentiation to calculate ε. The resulting u value will match the output of the key generation stage which, in turn, matches the value that running scalar_base_mult from the Curve25519 code would have produced for the same private key.

Note that the Elligator software page says that GMP can compute the Legendre symbol (the ε calculation) in only 9000 cycles, so I'm clearly missing some implementation trick somewhere. Additionally, the Elligator paper raises the possibility of implementing the inversions with a blinded, Euclidean algorithm. That would certainly be faster but I'm not comfortable that I know how to do that safely, so I'm sticking with exponentiation here.


I've a constant-time, pure Go implementation, based on the field operations from the pure-C, SUPERCOP, Ed25519 code. The field operations have been ported pretty mechanically from the C code, so it's hardly pretty Go code, but it seems to work.

On my, 3.1GHz, Ivy Bridge system (with TurboBoost enabled) the key generation takes 155µs and the forward map operation takes 28µs. (But remember that half of the key generations will fail, so ~double that time.)

I have not carefully reviewed the code, so beware. There are also some non-obvious concerns when using Elligator: for example, if you are hiding a uniform representative in an nonce field then the attacker might assume that it's a field element, negate it and let the connection continue. In a normal protocol, altering the nonce usually leads to a handshake failure but negating an Elligator representative is a no-op. If the connection continues then the attacker can be pretty sure that the value wasn't really random. So care is required.

So if you think that Elligator might be right for you, consult your cryptographer.

Forward security for journalists

A number of journalists have been writing stories about forward security recently. Some of them ended up talking to me and I completely failed to come up with a good metaphor that didn't cause more problems than it solved. Eventually I wrote the following as an introduction to forward security for journalists. It's deeply simplified but I think it captures the broad strokes.

It remains unclear whether anyone actually came away with a better understanding after reading it, however.

A great many metaphors have been mangled, and analogies tortured, trying to explain what forward security is. But, fundamentally, it doesn't involve anything more complex than high-school math and anybody should be able to understand it. So this document attempts to describe what forward security is.

In the 1970s, cryptography was purely the concern of the military. The NSA existed (unofficially) and believed that it was the sole, legitimate center of cryptographic knowledge in the US. In this environment, Whitfield Diffie set out from MIT across the country to try and gather up what scraps of cryptography has escaped into the public realm. If we omit huge amounts of detail (see Steven Levy's book, “Crypto” for the details) we can say that he ended up at Stanford with a revolutionary idea in his head: public key cryptography.

Up to that point, cryptography had involved secret keys that both the sender and receiver were in possession of. Diffie wondered about a system where one could publish part of a key (the public part) and keep another part of the key private. However, he didn't have any concrete implementation of this.

At Stanford he found Martin Hellman and, together they published New Directions in Cryptography (1976), which contained the Diffie-Hellman key-agreement algorithm. This is possibly the most significant work in cryptographic history and is still a core algorithm today. (Later we learnt that a researcher at GCHQ had discovered the idea a few years earlier. But GCHQ didn't do anything with it and never published it.)

We'll need modular arithmetic, which is like working with times. What's 8pm plus five hours? One o'clock. Because 8+5=13 and 13 divided by 12 has a remainder of one. Likewise, 8pm plus 17 hours is still one o'clock, because adding multiples of 12 doesn't make any difference to a clock. We'll be using the same idea, but with numbers (called the modulus) other than 12.

To explain Diffie-Hellman we'll work modulo 11. So 5+10=4 now, because 15 divided by 11 has a remainder of 4.

Next, exponentiation: 23 means 2, multiplied by itself three times. So 23=2×2×2=8. Now we have everything that we need to implement Diffie-Hellman.

In classic, cryptographic tradition we'll imagine two parties and we'll call them Alice and Bob. Alice generates a random number between 1 and 9 and calls it a. This is her private key; let's say that it's five. Alice calculates her public key as A=2a=25=32=10 mod 11.

Bob does the same but his private key is called b and his public key is B. Let's say that his private key is six which means that B=2b=26=64=9 mod 11.

Alice and Bob can freely publish A and B because, given them, no other party can work backwards and calculate their private keys. Well, in this case they can because we're using a toy example with a tiny modulus. A real modulus has over 500 digits. Then it's not possible with any current computer.

Once Alice has Bob's public key, or vice-versa, she can perform key-agreement. She combines her private key with Bob's public key like this: Ba=95=59049=1 mod 11. Bob can combine his private key with Alice's public key in the same way: Ab=106=1000000=1 mod 11. Both Alice and Bob arrived at the same, shared value (one) which nobody else can calculate. And without exchanging any secret information!

This was a huge breakthrough but it still left something to be desired: signatures. It seemed that it would be very useful for Alice to be able to calculate some function of a message that could be broadcast and would serve as a signature. So it had to be something that only Alice (or, rather, the entity holding the private part of Alice's public key) could have generated. Diffie-Hellman doesn't provide this.

In 1977, three researchers at MIT, Rivest, Shamir and Adleman, produced such a scheme. Now it's called by their initials: RSA. Again, the basics are high-school math.

We'll be working with modular arithmetic again but this time it'll be modulo the product of two primes. Like the example above, the numbers involved are usually huge, but we're going to use tiny numbers for the purposes of the examples.

So our two primes will be p=11 and q=17. This will serve as Alice's RSA private key. In order to calculate an RSA public key, one calculates n=11×17=187.

The core of RSA is that, when working modulo 187, it's very easy to calculate the cube of a number. But, unless you know that 187=11×17, it's very difficult to calculate the cube root of a number. Let's say that our message is 15. The cube is easy: 153=3375=9 mod 187. But to calculate the cube root you need to calculate the modular inverse of 3 and (p-1)(q-1)=10×16=160. We're getting a little past high-school math here so trust me that it's 107. By calculating 9107 mod 187 we find that we get our original number again: 15.

Think about what this means. If Bob knows that Alice's public key is 187, he can cube a message and send it to Alice and only Alice can calculate the cube root and recover the message. Also, if Alice starts with a message, she can calculate the cube root of it and publish that. Everyone else can be convinced that only Alice could have calculated the cube root of the message and so that functions as a digital signature of the message!

Now we have everything that we need to understand forward security so hang on while we wrap it all up.

RSA and Diffie-Hellman are both too slow to use for large amounts of data. Bulk data transfer uses secret-key cryptography: the old style cryptography where both sides need to know the secret key in order to process the ciphertext. But, in order to get the secret key to the other party, RSA or Diffie-Hellman is used.

Traditional HTTPS has the browser use RSA to transmit a secret key to the server. The browser picks a random, secret key, cubes it modulo the server's RSA modulus and sends it to the server. Only the server knows the two numbers that, multiplied together, equal its modulus so only the server can calculate the cube root and get the secret key for the connection.

However, if anyone were to record the HTTPS connection and later learn those two numbers (called factors) then they can retrospectively decrypt the recorded connections. They can also watch new connections to the server and decrypt those because they can calculate cube roots just as well as the real server can.

With forward security, the method of establishing the secret key for the connection is changed. When a client connects, the server generates a brand new Diffie-Hellman private key just for that connection. (It's called the ephemeral private key.) Then the server calculates its Diffie-Hellman public key (by 2a mod m, remember?) signs the public key with its RSA key (by calculating the cube root of it, modulo its RSA modulus) and sends all that to the browser. The browser verifies the signature (by cubing it and checking that the result equals the public key) and then generates its own Diffie-Hellman private and public keys. Now, the client has the server's Diffie-Hellman public key, so it can combine it with its Diffie-Hellman private key and calculated the shared-secret. It sends its Diffie-Hellman public key to the server and the server ends up with the same, shared secret. That secret is used as the secret key for the connection.

The advantage here is that, should the server's, long-term, RSA private key be compromised in the future it isn't as bad. The confidentiality of the connections is protected by the Diffie-Hellman secrets which were generated fresh for the connection and destroyed immediately afterwards. Also, the attacker, even with the RSA private key cannot passively decrypt connections in real-time. The attacker has to impersonate the server, which is much more work.

That's why forward-secrecy is often written as DHE: it stands for Diffie-Hellman ephemeral. There's also ECDHE (Elliptic Curve, Diffie-Hellman ephemeral) which is the same idea in a more efficient mathematical structure.


It seems that secure messaging projects are all the rage right now. Heck, Matthew Green is writing about it in the New Yorker this week!

One of my side-projects for a year or so has been about secure messaging. Of course, that means that I was working on it before it was cool (insert hipster emoticon here, :{D ) … but I'm slow because my side projects get very little time. (When I started, the subheading for the project was “How to better organise a discreet relationship with the Director of the CIA”, because that was the topical reference at the time. Oh, how things change.)

It's still not really ready, but the one year mark seemed like a good deadline to give myself to mention it in public. You should pay heed to the warning on the project page however. I did contact one, well known, security auditing company to see whether they would take a couple thousand dollars to spend a day on it, but they said that don't really do small projects and that I'd need to be talking 10-20 times that for a couple of weeks at least. So don't depend on it yet and don't be surprised if I've just broken it somehow by replacing the ratchet code. (Although, thanks, Trevor, for the design.)

But I fear that it's not going to make Matthew Green happy! It's certainly not a drop-in replacement for email. I spend my days working on small changes with a large multiplier, in my free time I choose to shift the balance completely the other way.

The source code is in Go, but there are binaries for several Linux flavors and OS X. (Although it looks terrible on OS X and one user reports that the program can only be closed by killing it from the command line: I've not reproduced that yet.) In addition to the default server, I learned recently that the Wau Holland Foundation also runs a Pond server.

For more details, see the user guide, threat model etc which are linked from the project page.

If you get it running, but don't know anyone else, you can use:

gpg --recv-key C92172384F387DBAED4D420165EB9636F02C5704

to get my public key and email me a shared-secret. (Although I only check that email during the evenings.)

Please update F5/BIG-IP firmware

Update (Dec 7th): an F5 engineer commented on the TLS WG mailing list describing the internals of the hang bug. Based on this, we realised that it would be possible to work around the issue by padding ClientHello messages of a certain size. We now have a design for doing this, it's implemented in Chrome and it appears to be working! So, while you should always keep software up to date, it appears that the Internet can dodge this bug!

If you use F5/BIG-IP devices to terminate SSL connections, please update the firmware on the things! We're trying to run an Internet here and old versions of these devices are a real problem for deploying new TLS features. You need to be running at least version 10.2.4 (as far as I know), but running the latest version is generally good advice.

If you just try to connect to these sites with a recent version of OpenSSL, you should find that the connection hangs - which is terrible. We can detect a server that returns an error, but hanging the connection isn't something we can generally work around.

$ openssl version
OpenSSL 1.0.1e 11 Feb 2013
$ openssl s_client -connect stubhub.com:443
$ openssl s_client -connect stubhub.com:443 -tls1
New, TLSv1/SSLv3, Cipher is AES256-SHA
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE

I did have a long list of major sites that were affected by this issue here. I've removed it because some of them had updated and, because of the update at the top, it's no longer a crippling problem.

ChaCha20 and Poly1305 for TLS

Today, TLS connections predominantly use one of two families of cipher suites: RC4 based or AES-CBC based. However, in recent years both of these families of cipher suites have suffered major problems. TLS's CBC construction was the subject of the BEAST attack (fixed with 1/n-1 record splitting or TLS 1.1) and Lucky13 (fixed with complex implementation tricks). RC4 was found to have key-stream biases that cannot effectively be fixed.

Although AES-CBC is currently believed to be secure when correctly implemented, a correct implementation is so complex that there remains strong motivation to replace it.

Clearly we need something better. An obvious alternative is AES-GCM (AES-GCM is AES in counter mode with a polynomial authenticator over GF(2128)), which is already specified for TLS and has some implementations. Support for it is in the latest versions of OpenSSL and it has been the top preference cipher of Google servers for some time now. Chrome support should be coming in Chrome 31, which is expected in November. (Although we're still fighting to get TLS 1.2 support deployed in Chrome 30 due to buggy servers.)

AES-GCM isn't perfect, however. Firstly, implementing AES and GHASH (the authenticator part of GCM) in software in a way which is fast, secure and has good key agility is very difficult. Both primitives are suited to hardware implementations and good software implementations are worthy of conference papers. The fact that a naive implementation (which is also what's recommended in the standard for GHASH!) leaks timing information is a problem.

AES-GCM also isn't very quick on lower-powered devices such as phones, and phones are now a very important class of device. A standard phone (which is always defined by whatever I happen to have in my pocket; a Galaxy Nexus at the moment) can do AES-128-GCM at only 25MB/s and AES-256-GCM at 20MB/s (both measured with an 8KB block size).

Lastly, if we left things as they are, AES-GCM would be the only good cipher suite in TLS. While there are specifications for AES-CCM and for fixing the AES-CBC construction, they are all AES based and, in the past, having some diversity in cipher suites has proven useful. So we would be looking for an alternative even if AES-GCM were perfect.

In light of this, Google servers and Chrome will soon be supporting cipher suites based around ChaCha20 and Poly1305. These are primitives developed by Dan Bernstein and are fast, secure, have high quality, public domain implementations, are naturally constant time and have nearly perfect key agility.

On the same phone as the AES-GCM speeds were measured, the ChaCha20+Poly1305 cipher suite runs at 92MB/s (which should be compared against the AES-256-GCM speed as ChaCha20 is a 256-bit cipher).

In addition to support in Chrome and on Google's servers, myself and my colleague, Elie Bursztein, are working on patches for NSS and OpenSSL to support this cipher suite. (And I should thank Dan Bernstein, Andrew M, Ted Krovetz and Peter Schwabe for their excellent, public domain implementations of these algorithms. Also Ben Laurie and Wan-Teh Chang for code reviews, suggestions etc.)

But while AES-GCM's hardware orientation is troublesome for software implementations, it's obviously good news for hardware implementations and some systems do have hardware AES-GCM support. Most notably, Intel chips have had such support (which they call AES-NI) since Westmere. Where such support exists, it would be a shame not to use it because it's constant time and very fast (see slide 17). So, once ChaCha20+Poly1305 is running, I hope to have clients change their cipher suite preferences depending on the hardware that they're running on, so that, in cases where both client and server support AES-GCM in hardware, it'll be used.

To wrap all this up, we need to solve a long standing, browser TLS problem: in order to deal with buggy HTTPS servers on the Internet (of which there are many, sadly), browsers will retry failed HTTPS connections with lower TLS version numbers in order to try and find a version that doesn't trigger the problem. As a last attempt, they'll try an SSLv3 connection with no extensions.

Several useful features get jettisoned when this occurs but the important one for security, up until now, has been that elliptic curve support is disabled in SSLv3. For servers that support ECDHE but not DHE that means that a network attacker can trigger version downgrades and remove forward security from a connection. Now that AES-GCM and ChaCha20+Poly1305 are important we have to worry about them too as these cipher suites are only defined for TLS 1.2.

Something needs to be done to fix this so, with Chrome 31, Chrome will no longer downgrade to SSLv3 for Google servers. In this experiment, Google servers are being used as an example of non-buggy servers. The experiment hopes to show that networks are sufficiently transparent that they'll let at least TLS 1.0 through. We know from Chrome's statistics that connections to Google servers do end up getting downgraded to SSLv3 sometimes, but all sorts of random network events can trigger a downgrade. The fear is that there's some common network element that blocks TLS connections by deep-packet inspection, which we'll measure by breaking them and seeing how many bug reports we get.

If that works then, in Chrome 32, no fallbacks will be permitted for Google servers at all. This stage of the experiment tests that the network is transparent to TLS 1.2 by, again, breaking anything that isn't and seeing if it causes bug reports.

If both those experiments work then, great! We can define a way for servers to securely indicate that they don't need version fallback. It would be nice to have used the renegotiation extension for this, but I think that there are likely already too many broken servers that support that, so another SCSV is probably needed.

If we get all of the above working then we're not in too bad a state with respect to TLS cipher suites. At least, once most of the world has upgraded in any case.

Playing with the Certificate Transparency pilot log

I've written about Certificate Transparency several times in the past, but now there's actually something to play with! (And, to be clear, this is entirely thanks to several of my colleagues, not me!) So I though it would be neat to step through some simple requests of the pilot log.

I'm going to be using Go for this, because I like it. But it's substantially just JSON and HTTP and one could do it in any language.

In order to query a log you need to know its URL prefix and public key. Here's a structure to represent that.

// Log represents a public log.
type Log struct {
	Root string
	Key *ecdsa.PublicKey

For the pilot log, the URL prefix is https://ct.googleapis.com/pilot and here's the public key:

-----END PUBLIC KEY-----

Since it's a little obscure, here's the code to parse a public key like that. (This code, and much of the code below, is missing error checking because I'm just playing around.)

	block, _ := pem.Decode([]byte(pilotKeyPEM))
	key, _ := x509.ParsePKIXPublicKey(block.Bytes)
	pilotKey = key.(*ecdsa.PublicKey)
	pilotLog = &Log{Root: "https://ct.googleapis.com/pilot", Key: pilotKey}

The log is a tree, so the first thing that we want to do with a log is to get its head. This is detailed in section 4.3 of the RFC: we make an HTTP GET to get-sth and it returns a JSON blob.

Any HTTP client can do that, so give it a go with curl on the command line:

$ curl https://ct.googleapis.com/pilot/ct/v1/get-sth 2>>/dev/null | less

You should get a JSON blob that, with a bit of formatting, looks like this:

	"tree_size": 1979426,
	"timestamp": 1368891548960,
	"sha256_root_hash": "8UkrV2kjoLcZ5fP0xxVtpsSsWAnvcV8aPv39vh96J2o=",
	"tree_head_signature": "BAMARjBEAiAc95/ONhz2vQsULrISlLumvpo..."
In Go, we'll parse that into a structure like this:

// Head contains a signed tree head.
type Head struct {
	Size uint64 `json:"tree_size"`
	Time time.Time `json:"-"`
	Hash []byte `json:"sha256_root_hash"`
	Signature []byte `json:"tree_head_signature"`
	Timestamp uint64 `json:"timestamp"`

And here's some code to make the HTTP request, check the signature and return such a structure:

func (log *Log) GetHead() (*Head, error) {
	// See https://tools.ietf.org/html/rfc6962#section-4.3
	resp, err := http.Get(log.Root + "/ct/v1/get-sth")
	if err != nil {
		return nil, err

	defer resp.Body.Close()
	if resp.StatusCode != 200 {
		return nil, errors.New("ct: error from server")
	if resp.ContentLength == 0 {
		return nil, errors.New("ct: body unexpectedly missing")
	if resp.ContentLength > 1<<16 {
		return nil, errors.New("ct: body too large")
	data, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return nil, err
	head := new(Head)
	if err := json.Unmarshal(data, &head); err != nil {
		return nil, err

	head.Time = time.Unix(int64(head.Timestamp/1000), int64(head.Timestamp%1000))

	// See https://tools.ietf.org/html/rfc5246#section-4.7
	if len(head.Signature) < 4 {
		return nil, errors.New("ct: signature truncated")
	if head.Signature[0] != hashSHA256 {
		return nil, errors.New("ct: unknown hash function")
	if head.Signature[1] != sigECDSA {
		return nil, errors.New("ct: unknown signature algorithm")

	signatureBytes := head.Signature[4:]
	var sig struct {
		R, S *big.Int

	if signatureBytes, err = asn1.Unmarshal(signatureBytes, &sig); err != nil {
		return nil, errors.New("ct: failed to parse signature: " + err.Error())
	if len(signatureBytes) > 0 {
		return nil, errors.New("ct: trailing garbage after signature")
	// See https://tools.ietf.org/html/rfc6962#section-3.5
	signed := make([]byte, 2 + 8 + 8 + 32)
	x := signed
	x[0] = logVersion
	x[1] = treeHash
	x = x[2:]
	binary.BigEndian.PutUint64(x, head.Timestamp)
	x = x[8:]
	binary.BigEndian.PutUint64(x, head.Size)
	x = x[8:]
	copy(x, head.Hash)

	h := sha256.New()
	digest := h.Sum(nil)

	if !ecdsa.Verify(log.Key, digest, sig.R, sig.S) {
		return nil, errors.New("ct: signature verification failed")

	return head, nil

If one runs this code against the current pilot, we can get the same information as we got with curl: 1979426 certificates at Sat May 18 11:39:08 EDT 2013, although now the date will be newer and there will be more entries.

As you can see, the log has been seeded with some certificates gathered from a scan of the public Internet. Since this data is probably more recent than the EFF's Observatory data, it might be of interest and anyone can download it using the documented interface. Again, we can try a dummy request in curl:

$ curl 'https://ct.googleapis.com/pilot/ct/v1/get-entries?start=0&end=0' 2>>/dev/null | less

You should see a result that starts with {"entries":[{"leaf_input":"AAAAAAE9p. You can request log entries in batches and even download everything if you wish. My toy code simply writes a single large file containing the certificates, gzipped and concatenated with a length prefix.

My toy code is go gettable from https://github.com/agl/certificatetransparency. In the tools directory is ct-sync, which will incrementally download the pilot log and ct-map, which demonstrates how to look for interesting things by extracting the certificates from a local mirror file. Both will use multiple cores if possible so be sure to set GOMAXPROCS. (This is very helpful when calculating the hash of the tree since the code doesn't do that incrementally.)

The project's official code is at https://code.google.com/p/certificate-transparency/. See the README and the main site.

And since we have the tools to look for interesting things, I spent a couple of minutes doing so:

I wrote a quick script to find ‘TURKTRUST’ certificates - leaf certificates that are CA certificates. The log only contains certificates that are valid by a fairly promiscuous root set in order to prevent spam, so we don't have to worry about self-signed certificates giving us false positives. As ever the Korean Government CA is good to highlight malpractice with 40 certificates that mistakenly have the CA bit set (and are currently valid). This issue was reported by a colleague of mine last year. Thankfully, the issuing certificate has a path length constraint that should prevent this issue from being exploited (as long as your X.509 validation checks these sort of details). The root CA is also only present in the Microsoft root set as far as I know.

A different Korean root CA (KISA) can't encode ASN.1 correctly and is issuing certificates with negative serial numbers because ASN.1 INTEGERs are negative if the most significant bit is one.

Another CA is issuing certificates with 8-byte IP addresses (I've let them know).

The ct-map.go in the repository is setup to find publicly valid certificates with names that end in .corp (which will soon not get an HTTPS indication in Chrome.)

Anyway, at the moment the log is just pre-seeded with some public certificates. The real value of CT comes when we have strong assurances that all valid certificates are in the log. That will take some time, but we're working towards it.

Hash based signatures

It's likely that Kevin Poulsen will still be alive in twenty years time, and that his encrypted message to Edward Snowden will still be saved somewhere. In fact, there are several places where people post RSA-encrypted messages and seem to assume that they're secure. Even assuming that no huge advances in factoring occur (and Joux et al have just killed discrete logs in binary fields, so it's not impossible), large, quantum computers will be able to break them.

It might be the case that building practical, quantum computers doesn't turn out to be possible (D-Wave machines don't count here), but there are people trying hard to do it. From “Deep State” (a rather fragmented book, but interesting in parts):

There is a quantum computing arms race of sorts under way. China, Israel and Russia have advanced quantum computing programs with direct aim of gaining geopolitical advantage, as does DARPA itself.

There are classified and semi-classified DARPA and Army/Air Force/Navy Research Lab programs for the potential use of quantum computers for a select set of defense technologies, including: the ability to design a perfect sensing laser for drones or satellites; the ability to design radar that can defeat counter-stealth techniques; the ability to design coatings for aircraft that truly are stealth, owning to the exploitation of quantum fluid dynamics.

Now, government agencies don't care about Wired, but they do care about those things. So it's a possibility that, in a few decades, an academic lab will be in possession of a large quantum computer and might want to pursue some light-hearted historical investigations. (Just in case, I've included the Wired message and key below in case either is deleted ☺)

Despite a common misunderstanding, quantum computers don't break all public key cryptography, and quantum cryptography isn't the answer. (Unless you have lots of money and a dedicated fiber line, and possibly not even then.) Instead post-quantum cryptography is the answer, and there's even a conference on it.

So, should we be using post-quantum algorithms now? The reason that we don't currently use them is because they generally take one or two orders of magnitude more time, space or both. For high-volume work like TLS, that puts them outside the bounds of practicality. But for an application like PGP, where the volume of messages is very low, why not? Size might be an issue but, if it takes 1 second to process a message vs 10 milliseconds, it doesn't really make a difference. You might as well dial the security up to 11.

There are a couple of things that one needs for a system like PGP: encryption and signatures. One option for building post-quantum, public-key signatures is hash-based signatures. These are actually really old! They were described by Lamport in 1979, only a couple of years after RSA. In fact, as Rompel showed, a secure signature scheme exists if and only if a secure hash-based signature scheme exists. Individual hash functions may be broken, but if hash-based signatures are broken in general then there can be no secure signatures at all!

The most primitive hash-based signature can be used only once, and only to sign a single-bit message: Alice picks two random values, x and y and publishes her public key: H(x) and H(y). In order to sign a message, Alice publishes one of the two, original values depending on the value of the bit.

Clearly this only works for a single bit message. It's also a one-time signature because once one of the original values has been published, it's insecure for the same public key to be used again.

The first extension to this scheme is to run many instances in parallel in order to sign larger messages. Fix the hash function, H, as SHA-256. Now Alice picks 512 random values, x0..255 and y0..255 and publishes the hash of each of them as her public key. Now she can sign an arbitrarily long message, m, by calculating H(m) and then publishing either xi or yi depending on the bits of H(m).

It's still one-time though. As soon as Alice signs a different message with the same public key then, for some values of i, both xi and yi have been published. This lets an attacker forge signatures for messages that Alice hasn't signed.

In order to allow signing multiple messages there's one easy solution: just have multiple one-time signatures! Let's say that we want a thousand. The issue is that the public and private keys of our one-time scheme were 512×32 bytes = 16KB and, if we want a thousand of them, then the public and private keys end up being 16MB, which is quite a lot. We can solve the private key size by generating a stream of values from a cipher with just a 32-byte seed key, but the large public key remains.

The solution to this is to use a Merkle hash tree, introduced (unsurprisingly) by Merkle, in 1979.

The the diagram, above, four leaf hashes are included in the hash tree: A, B, C and D. The hash tree is a binary tree where each non-leaf node contains the hash of its children. If Bob believes that H(H(AB)H(CD)) is Alice's public key, then anyone can convince him that A is part of Alice's public key by giving him A, B and H(CD). Bob calculates H(AB) from A and B, and then H(H(AB)H(CD)) from H(AB) and H(CD). If the values match, and if the hash function isn't broken, then A must be part of Alice's public key.

Now, if A, B, C and D were one-time signature public keys then Alice's public key has been reduced to a single hash. Each signature now needs to include the one-time signature value, the rest of the public key, and also a proof that the public key is in Alice's public key. In more detail, the signature contains either xi or yi depending on the bits of the hash of the message to be signed and also H(xi) or H(yi) for the bits that weren't used. With that information, Bob can reconstruct the one-time signature public key. Then the signature contains the path up the tree - i.e. the values B, and H(CD) in our example, which convince Bob that the one-time signature public key is part of Alice's overall public key.

So now the public and private keys have been reduced to just 32 bytes (at least if you're using a 256-bit hash). We can make the tree arbitrarily large, thus supporting arbitrarily many signatures. The signatures, however, are 16KB plus change, which is pretty chunky. So the next trick will try to address this, and it's thanks to Winternitz.

The Winternitz trick was described in Merkle's original, 1979 work and it involves iterating hashes. We started off by considering the one-time signature of a single bit and solved it by publishing H(x) and H(y) and then revealing one of the preimages. What if we generated a single random value, z, and published H(H(z))? We would reveal H(z) if the bit was one, or z if the bit was zero. But an attacker could take our signature of zero (z) and calculate H(z), thus creating a signature of one.

So we need a checksum to make sure that, for any other message, the hash function needs to be broken. So we would need two of these structures and then we would sign 01 or 10. That way, one of the hash chains needs to be broken in order to calculate the signature of a different message. The second bit is the Winternitz checksum.

As it stands, Winternitz's trick hasn't achieved anything - we're still publishing two hash values and revealing two secrets. But when we start signing larger messages it becomes valuable.

Let's say that the Winternitz hash chains were 16 values long. That means that one of them can cover four bits of the hash that we're signing. If we were signing a 256-bit value, then we only need 64 Winternitz chains and three for the checksum. Also, the Winternitz scheme makes it easy for the verifier to calculate the original public key. The signature size has gone from 16KB to 2,144 bytes. Not bad!

But! There's still a critical flaw in this! Recall that using a one-time signature twice completely breaks the system. Because of that, the signature scheme is “stateful”. This means that, when signing, the signer absolutely must record that a one-time key has been used so that they never use it again. If the private key was copied onto another computer and used there, then the whole system is broken.

That limitation might be ok in some situations, and actually means that one can build forward-secure signature schemes: schemes where signatures prior to a key compromise can still be trusted. Perhaps for a CA where the key is in an HSM that might be useful. However, for most environments it's a huge foot-cannon.

We could pick one-time public keys to use based on the hash of the message, or at random in the hope of never colliding but, in order for such schemes to be safe the tree of one-time public keys needs to be really big. Big as in, more than 2128 entries. If we use the trick of generating the private keys from the output of a cipher then, as soon as we fix the cipher private key, we have determined the whole tree. However, although we can compute any part of it, we can't compute all of it. At least not without an infinite amount of time (or close enough to infinite as makes no difference).

So we have to defer calculating parts of the tree until we actually need them in order to generate a signature. In order to defer bits of the tree, we have to split the tree up into a forest of smaller trees. The top-level tree is calculated as before and the Merkle hash of it is the public key. But rather than sign a message with the one-time leaves of that tree, we sign the root hash of a subtree. Since everything is deterministic, the subtree at any position is always the same so we're always signing the same message - which is safe.

Thus, if we wished to have a conceptual hash tree with 2160 leaves, we could split it up, say, every 16 bits. The tip of the tree would be a hash tree with 216 leaves. Those leaves would sign the root of a hash tree of another 216 leaves and so one. In order to make 2160, we would need 10 layers of such trees.

I can't have an image with all 216 leaves in each tree, so 22 will have to do. When generating the key, only the top (blue) tree needs to be calculated. When generating each signature, all the subtrees down the path to the final one-time key need to be calculated. So the signature consists of a one-time signature, from the 3rd leaf of the blue tree, of the root of the red tree (as well as Merkle path to the blue root), plus a one-time signature, from the 1st leaf of the red tree, of the root of the green tree (as well as a Merkle path to the red root), etc.

Obviously these signatures are going to be rather larger, due to all the intermediate signatures, and will take rather longer to calculate, due to calculating all the subtrees on the fly. Below is a little Haskell program that tries to estimate signature size vs signature time. It may well have some bugs because I haven't actually coded up a working example to ensure that it does actually work.

log2 = logBase 2 :: Double -> Double

-- We assume SHA-256 for the sake of this example.
hashBits = 256
hashBytes = hashBits/8
-- This is the Winternitz parameter.
w = 64
wBits = log2 w
-- These are parameters for the Winternitz signature.
l1 = fromIntegral $ ceiling (hashBits/wBits)
l2 = 1 + (fromIntegral $ floor $ log2(l1*(w-1))/wBits)
-- wWords is the number of hash chains that we'll have.
wWords = l1 + l2

-- An SSE2, SHA-256 implementation can hash at 5 cycles/byte
hashCyclesPerByte = 5
-- We assume that we can generate keystream bytes at 2 cycles/byte.
secretCyclesPerByte = 2

otsSignatureBytes = wWords * hashBytes
-- In order to generate a single OTS public key we need to calculate wWords
-- hash chains to a depth of w hashes.
otsHashOps = wWords * w
otsHashBytes = hashBytes * otsHashOps
otsHashCycles = hashCyclesPerByte * otsHashBytes
-- In order to generate the private key, we need to generate this many bytes of
-- key stream.
otsPrivateBytes = wWords * hashBytes
otsPublicBytes = otsPrivateBytes
otsPrivateCycles = secretCyclesPerByte * otsPrivateBytes

-- This is the height of a tree.
treeHeight = 8
-- To calculate the public keys for a whole tree we have to generate 2^treeHeight public keys.
treePublicKeyGenCycles = (otsPrivateCycles + otsHashCycles) * 2**treeHeight
-- To calculate the hash of the root of the tree, we first need to hash all the
-- public keys.
treePublicKeyHashCycles = otsPublicBytes * hashCyclesPerByte * 2**treeHeight
-- Then we need to hash all the hashes
treeMerkleCycles = 2**(treeHeight-1) * hashBytes * hashCyclesPerByte
treeSignatureCycles = treePublicKeyGenCycles + treePublicKeyHashCycles + treeMerkleCycles
-- As part of the signature we need to publish the OTS signature and a path up
-- the Merkle tree. The signature is the same size as a public key because the
-- Winternitz hash chain are just slightly shorter.
treeSignatureBytes = otsPublicBytes + treeHeight*hashBytes

logKeys = 160
forestHeight = fromIntegral $ ceiling (logKeys/treeHeight)
forestSignatureBytes = forestHeight * treeSignatureBytes
forestSignatureCycles = forestHeight * treeSignatureCycles
forestSignatureTime = forestSignatureCycles / 2.9e9

The Haskell code assumes that one can calculate SHA-256 at 5 cycles/byte, which is possible for concurrent hashing as is needed in this case. There are several parameters that one can play with: the Winternitz value, the size of the trees (which is assumed to be uniform here) and the size of the forest. I've taken the size of the forest to be 2160, although that only allows 216 signatures before the probability of a collision reaches 2-128. However, the overall security of the scheme is somewhat less than 128 bits because there are multiple hashes to attack concurrently.

With the parameters that I picked above, the estimated, lower-bound time for signing is 0.8 seconds (on a 2.9GHz machine) and the signatures are 34KB. The algorithm is embarrassingly parallel, so multi-core machines can cut the signing time down a lot, but the size is more difficult to improve. Changing the parameters to achieve a 20KB signature results in a signing time of nearly 30 seconds! (And it's a lower-bound of the signing time - in the real world there will be additional overhead.)

So, should everyone be using hash-based, PGP signatures now? Well, probably not, at least not for a subkey: PGP clearsigned messages would look silly with 34KB of base64-encoded noise at the bottom. Perhaps for a long-term signing key if you really want. But, over the long-term, it's generally confidentiality (i.e. encryption) that you need to maintain and I haven't talked about post-quantum encryption at all in this post. (Maybe in the future.)

But for problems like software updates, a reasonable case can be made. The size of the signature is generally trivial compared to the size of the message and software really does last for decades now. Hash-based signatures allow you to forget about any cryptographic breakthroughs to the greatest degree possible.

There's lots more to hash-based signatures than I've covered here:

And those messages for anyone in possession of a large, quantum computer:

Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
Comment: GPGTools - http://gpgtools.org




How to botch TLS forward secrecy

There's been some discussion about forward secret TLS recently, spurred, I think, by this piece from Declan McCullagh on CNET.

In case there are people out there wondering about turning it on, I thought it would be a good time to discuss some of the ways to mess it up!

In the case of multiplicative Diffie-Hellman (i.e. DHE), servers are free to choose their own, arbitrary DH groups. Historically some servers have had groups as small as 256-bits! Opera used to reconnect without DHE cipher suites in this case and Chrome now simply fails the connection with ERR_SSL_WEAK_SERVER_EPHEMERAL_DH_KEY. However, it's still the case that some servers use 512-bit DH groups, meaning that the connection can be broken open with relatively little effort.

So the first way to mess up forward secrecy is to use under sized DH groups. Ideally the DH group would match or exceed the RSA key size but 1024-bit DHE is arguably better than straight 2048-bit RSA so you can get away with that if you want to.

If you're using ECDHE then you don't need to worry about it being too small because the smallest EC group that clients support (P-256) is far, far stronger than 2048-bit RSA anyway.

So the next way to mess up forward secrecy is to get compromised via session resumption. TLS offers two session resumption mechanisms: session IDs (where the server and client each store their own secret state) and session tickets (where the client stores the server's state, encrypted by the server). If an attacker can obtain the session resumption information for a connection then they can decrypt the connection. (This needn't be completely true, but it is for TLS because of the way that TLS is designed.)

In the case of session IDs, the session information will either be kept in memory or kept on disk, depending on how the server is configured. (See the SSLSessionCache directive in Apache.) The forward secrecy of a connection is bounded by how long the session information is stored at the server. A small, in-memory cache likely has a high turnover rate, but a disk-cache could retain that information for a long time. Ideally one would have a medium sized, in-memory cache that turned over once a day or so.

In the case of session tickets, the server's encrypted state is transmitted to the client in the clear, so the server's session ticket key is what's protecting the connection. If you're running a single server then the session ticket key is probably generated randomly, at startup and kept in memory. So the forward secrecy of the connections is actually limited by the security of the server's memory until the server is restarted. It's possible for servers to run for many months without a restart so connections could be a lot less forward secure than you might think.

If you run several servers then they all need to share the same session ticket key otherwise they can't decrypt each other's session tickets. In Apache you would configure that with the SSLSessionTicketKeyFile directive. In this case your forward secrecy is limited by the security of that file on disk. Since your SSL private key is probably kept on the same disk, enabling forward secure cipher suites probably hasn't actually achieved anything other than changing the file that the attacker needs to steal!

So how do you run forward secrecy with several servers and support session tickets? You need to generate session ticket keys randomly, distribute them to the servers without ever touching persistent storage and rotate them frequently. However, I'm not aware of any open source servers that support anything like that.

Sudden Death Entropy Failures

During the time that the RSA patent was in force, DSA was the signature algorithm of choice for any software that didn't want to deal with patent licenses. (Which is why lots of old PGP keys are still DSA.) It has slowly disappeared since the patent expired and it appears that 4096-bit RSA is now the algorithm of choice if you're on the run from the NSA [1]. (And if you're a journalist trying to get a reply: keyid BDA0DF3C.)

But DSA can also be used with elliptic curves in the form of ECDSA and, in that form, it's likely that we'll see it return in the future, at least to some extent. SSH and GPG both support ECDSA now and CAs are starting to offer ECDSA certificates for HTTPS.

Unfortunately, DSA has an important weakness that RSA doesn't: an entropy failure leaks your private key. If you used a machine affected by the Debian entropy bug then, in that time, messages that you encrypted with RSA can be broken. But if you signed anything with a DSA key, then your private key is compromised.

The randomness in DSA is absolutely critical. Given enough signatures, leaking just a handful bits per signature is sufficient to break it. In the limit, you can make the make the mistake that Sony did and not understand that the random number needs to be generated for each message and, seemingly, just pick one and code it in. (See XKCD.)

But it doesn't need to be this way! All that is required of the nonce is that it be unique for each distinct message and secret, and we can achieve that by hashing the message and private key together. That's what Ed25519 does and I've added the option to OpenSSL to do the same. Unlike RSA and Ed25519 signatures, DSA signatures are probabilistic - signing the same message twice with the same key will result in different signatures. Since someone may be depending on that, OpenSSL also hashes in some randomness to maintain that feature.

(p.s. if you're an actual cryptographer, please take a look at the core of the code and let me know if I'm an idiot.)

Since the DSA specification says that the nonce must be randomly generated, and compliance with the spec is important for many users, this isn't enabled by default. For ECDSA keys, one needs to call EC_KEY_set_nonce_from_hash, for example. But hopefully we can measurably improve things with little effort by doing this.


Here's an example of breaking DSA with a known nonce: essentially the Sony PlayStation hack. I've used the variable names from the Wikipedia page for those that want to follow along, although I'm not using a subgroup to keep things simple.

(The example is done with Sage.)

Let's pick some (public) DSA parameters:

p = 2903
F = GF(p)
g = F(2)
n = g.multiplicative_order()

Next, generate a key pair. x is the private key and y is the public key:

x = int(F.random_element()) % n
y = g^x
(x, y)
(1282, 966)

Generate our nonce, which will be fixed for two different messages in this example:

k = int(F.random_element()) % n
kInv = inverse_mod(k, n)

The first message that we sign will be 42 and the pair (r, s) is the signature. (Normally the message would be a hash of a much larger message.) The signature is very simple: r is gk and s is (m + xr)/k:

m = 42
r = int(g^k) % n
s = ((m + x*r) * kInv) % n
(r, s)
(1401, 1168)

As an aside, we can verify the signature that we just made:

w = inverse_mod(s, n)
u1 = m*w
u2 = r*w
v = g^u1*y^u2
v == r

Next, we also sign the message 24 (which I'll call mm) and we gather the two signatures. We have s and ss, both with the same r. Since s = (m + xr)/k, we can subtract the two to get (m + xr - mm - xr)/k, which is just (m - mm)/k. Thus k = (m - mm)/(s - ss). Given k, we can rearrange the original equation for s: s = (m + xr)/k ⇒ sk = m + xr ⇒ sk - m = xr ⇒ (sk - m)/r = x. And thus we have the private key:

(r, s) = (1401, 1168)
(r, ss) = (1401, 1212)
kk = ((42 - 24)*inverse_mod(s-ss,n)) % n
xx = ((s*kk - 42) * inverse_mod(r, n))%n
print xx == x

Faster curve25519 with precomputation

(Update: based on questions, I've clearly failed to point out what this is, or rather what it isn't. Fixed base multiplication only applies when performing many operations on the same public key, i.e. key generation or perhaps some other cases. If you're working with random public keys then precomputaion cannot help I'm afraid.)

Diffie-Hellman is one of the most important public-key cryptographic primitives, even if it doesn't have the name recognition of, say, RSA. When anything is providing forward secrecy then it's very likely using Diffie-Hellman. In fact, the reason that web servers do RSA operations at all is mostly a legacy hangover. If we could design things afresh then there would be a lot of more Diffie-Hellman and everything would be a lot faster!

Of the concrete implementations of Diffie-Hellman, curve25519 is the fastest, common one. There are some faster primitives in eBACS, but the ones that are significantly faster are also significantly weaker. My machine (E5-2690, with Hyperthreading and TurboBoost disabled) can do a curve25519, arbitrary base point, scalar multiplication (a.k.a. Diffie-Hellman key agreement) in just 67.5µs.

But nobody will complain if it gets faster, right?

The core of Diffie-Hellman is calculating xy in some group. The simplest way to calculate xy is the classic square-and-multiply algorithm:

(Note: I'll be using multiplicative notation throughout because I think it's more familiar to most people, although additive is more common for the specific groups that we're dealing with here.)

Start with the number 1, which is x0 (because anything to the power of zero is 1). If we multiply by x then we get x1. So, by multiplying by x we've added one to the exponent. If we square our new number then we get x2 - squaring doubles the exponent.

Now consider the exponent as a binary number: x1 is x1b and squaring it gets us x10b. Squaring is a left shift of the exponent! Now, given any y in binary form, we can calculate xy with a series of additions by 1 and left shifts - i.e. multiplies by x and squarings.

If we have y = 1011b then we start with x0 = 1, as always, and read y left-to-right: the first bit is a one so we multiply by x to get x1b. Left shift: x10b. We already have the 0 in there so left shift again: x100b. Multiply by x: x101b. Left shift: x1010b. Multiply by x: x1011b. Done.

With square-and-multiply, the bits of the desired exponent are clocked in, bit by bit, as the squarings do the left shifting.

This is essentially what curve25519 does. It has lots of tricks to make it fast but, if you're calculating lots of powers of the same x then there are a number of tricks to make things faster. I'm only going to cover a couple of basic ones here.

Firstly, one can precalculate x1, x2, x3, … x15. Now, by multiplying by one of those values we can add any value from 0 to 15 to the exponent in a single multiplication. We're essentially clocking in four bits at a time. After one multiplication we have to do four squarings to left shift 4 bits to make space for the next 4 bits to be clocked in. This reduces the number of multiplications by a factor of 4.

Next, imagine that our exponents were always 8-bits long. We could precalculate x00000000b, x00000001b, x00010000b and x00010001b. By multiplying by one of those values we can clock in bits in two positions at once. Previously we always clocked in bits at the right hand side of the exponent, but those precomputed powers mean that we can clock in bits at the forth bit too. Thus we deal with the exponent as two, 4-bit values and the squarings are shared between them. We only have to square and multiply four times to cover the whole 8-bit value.

In cryptography, the powers are usually larger than 8-bits, of course, but you can clock in at 4 or 8 positions if you wish. It's a classic time/memory trade-off because more positions mean exponentially more precomputed values.

(There's more and better tricks than the two that I've just outlined, some are detailed on this Wikipedia page.)

Precomputation in curve25519

The reason that such precomputation hasn't been applied to curve25519 before is (I think), because of a detail that I glossed over before: curve25519 is actually a Montgomery curve that uses differential addition. Rather than being able to multiply arbitrary values, a and b, you instead need to also know the value of a/b first. That throws a spanner in the works of the typical optimisations.

If you review elliptic curves, you'll recall that operations on the elliptic curve group consist of operations on the underlying field. In curve25519's case it's the field GF(2255-19), which is where it gets its name from. We can roughly measure the amount of time that something will take by the number of field operations required. (Note to practitioners: I'm taking a field squaring as 0.8 multiplications and one multiplication is an ‘operation’.)

A curve25519 exponent is 255 bits long. Each squaring takes 3.6 operations and each multiplication by x takes 4.6 operations. 255*(3.6+4.6) is 2091 operations. We also need a final conversion (a field inversion) that costs 215.2 operations for a total of 2306.2. That's the number to beat.

In order to work in a group which has nice, arbitrary multiplication operations we map curve25519 to an isomorphic, Twisted Edwards curve. In fact, exactly the same curve as used by Ed25519, which is a fast signature primitive. This also means that we get to use Ed25519's highly optimised code. In order to be able to use as much code as possible, we use exactly the same precomputation scheme as described in section 4 of the paper.

This scheme is based on 256-bit exponents, which it splits into 64, 4-bit chunks. But it uses lots of precomputed values! For every other 4-bit chunk, it precomputes all 16 possible values. So the first subtable contains the 16 values x0000b, x0001b, x0010b, …, x1111b. The next subtable contains the 16 values for the next but one 4-bit chunk. So that's x0000 0000 0000b, x0001 0000 0000b, x0010 0000 0000b, …, x1111 0000 0000b. It has 32 of those subtables!

It picks one value from each subtable and, with 32 multiplications, takes care of half the bits of the exponent. Then it squares four times to left-shift those values into place and does another 32 multiplications from the same subtables to take care of the other half of the bits. That's a total of 64 multiplications and 4 squarings.

(It actually uses a trick which means that it only needs to store 8 of the 16 values in each subtable, but that's outside the scope of this post.)

We're in a different group now and so the costs are different: multiplications cost 6 operations and squarings cost 7. (The field is the same so an ‘operation’ takes the same time.) 64*6 + 4*7 = 412, so that's looking good. We need another field inversion to finish up, same as curve25519, for a total cost of 627.2. Those rough costs suggest that the Twisted Edwards precomputation should be 3.6x faster, which is promising!

(Updated: I originally suggested that two field inversions were required because I'm an idiot. Thanks to @CodesInChaos for asking why.)

The actual timings

Based on the amd64-64-24k implementation of Ed25519 in SUPERCOP, it's fairly easy to implement. On the same machine as the curve25519 speed was measured, above, the precomputed curve25519 runs in 21.9µs, a 3.1x speedup with 24KB of tables. So it's not as fast as the rough calculations suggested, but that's because we're now processing 24KB of memory. That cache pressure isn't free of course, although benchmarks make it look mostly free because they run in a tight loop and so the tables will always be in L1 cache. Still, it means that CPU can sustain 350,000 public key operations per second.

(Code still too messy to release. I hope to tidy it up next week.)


Since its inception, SPDY has depended on a TLS extension called NPN. NPN allows a TLS connection to negotiate which application-level protocol will be running across it.

NPN allows SPDY to be enabled efficiently. If we had run SPDY on a different port, then we would have had to be constantly creating probing connections to see whether a site supported SPDY as well as HTTPS. Even if we knew that a site supported SPDY, network devices between any given client and that site might block connections to the different TCP port. If we had tried an HTTP Upgrade header, that would have slowed everything down and caused compatibility issues with servers and proxies that didn't process the header correctly.

NPN also allows us to update SPDY without spending round trips on a version negotiation. Overall, NPN has worked very well for SPDY.

NPN attempted to be a little bit future proof by sending the selected application protocol name under encryption, so that network devices couldn't discriminate. The benefit was somewhat limited because the server's list of supported protocols was still sent in the clear but we believe that anything that can be encrypted, should be encrypted.

There is an alternative to NPN: ALPN is essentially the same design except that the negotiation is done in the clear (like other TLS extensions).

Last Friday, at IETF 86 in Orlando, the TLS working group considered both designs and came to a rough consensus on ALPN. ALPN is currently on track to be published as an RFC at some point and we will be switching SPDY over to it and deprecating NPN.

Once IANA has assigned a TLS extension number for ALPN, Google servers will start supporting both NPN and ALPN, with a preference for ALPN. Chrome and, I expect, other browsers will start sending both NPN and ALPN extensions. During this time, SPDY servers will be able to switch from NPN to ALPN without dropping SPDY support for current clients.

At some point after the end of 2014, I plan on removing NPN support from Chrome and Google servers. Any old servers and clients will continue to function just fine: they'll just use HTTPS.

Lucky Thirteen attack on TLS CBC

In an upcoming paper (made public this morning), Nadhem AlFardan and Kenny Paterson describe another method of performing Vaudenay's attack on CBC as used in TLS. Firstly I'd like to thank the researchers for notifying the various vendors ahead of time so that patches could be prepared: the disclosure process has gone very smoothly in this case. I couldn't have asked for anything more - they did everything right.

Vaudenay's attack requires an attacker to be able to detect when a CBC padding check has succeeded, even if the authentication check then fails. Authentication should always be applied after encryption to avoid this, but TLS famously did it the wrong way round with CBC mode.

Knowing whether a padding check succeeded reveals information about the decrypted plaintext. By tweaking the ciphertext over many trials, it's possible to progressively decrypt an unknown ciphertext. For details, see their paper, which does a better job of explaining it than I would.

Vaudenay first used the fact that these two situations (padding check failure and authenticator failure) resulted in different TLS alert values, although that didn't result in a practical attack because TLS errors are encrypted. Once that was corrected (by specifying that the same alert value should be sent in each case) the next year's paper used a timing-side channel: if the authentication check wasn't performed when the padding check failed then, by timing the server's response, an attacker could tell whether the server aborted processing early.

To try and remove that side-channel, TLS stacks perform the authentication check whether or not the padding check succeeds. However, here's what I commented in the Go TLS code:

Note that we still have a timing side-channel in the MAC check, below. An attacker can align the record so that a correct padding will cause one less hash block to be calculated. Then they can iteratively decrypt a record by breaking each byte. However, our behavior matches OpenSSL, so we leak only as much as they do.

That pretty much sums up the new attack: the side-channel defenses that were hoped to be sufficient were found not to be (again). So the answer, this time I believe, is to make the processing rigorously constant-time. (Details below.)

As a practical matter, since a padding or authenticator check failure is fatal to a TLS connection, performing this attack requires a client to send the same plaintext secret on thousands of different connections to the same server. This isn't a trivial obstacle but it's possible to meet this requirement for cookies with a browser and bit of Javascript injected into any origin in the same session.

For DTLS the attack is much easier because a rejected record doesn't cause the connection to be destroyed and the same authors developed a method for amplifing timing attacks against DTLS in a previous paper

Unfortunately, unlike BEAST, this isn't something that we can unilaterally fix on the client side (except by disabling CBC mode, which might have significant compatibility issues.) On the server side, making RC4 the most preferable cipher solves the problem although I do recommend that server admins apply patches as they become available even so. Having RC4 as an option has been a useful fallback for CBC problems but, if patches aren't applied, then RC4 becomes a monoculture in the future.

Implementing constant time CBC decoding

Fixing this problem in a backwards compatible manner is, sadly, really rather complex. What need to ensure that all CBC records are authenticated in constant time. Initially we define constant-time to mean that, for all values of the secret input, the trace of instructions executed and memory locations accessed is identical (on an in-order machine). Here the secret input is the decrypted CBC record before the padding and MAC have been checked.

Meeting this definition is tough and there's a great temptation to remove some of the more glaring timing leaks and hope that the remainder is small enough not to be practical. But the DTLS trick mentioned above can be used to amplify any timing side-channels. Additionally, situations change and implementations that are currently only exposed over a network may, in the future, be exposed to attackers running on the same machine. At that point, cache-line probing and other powerful timing probes can be brought to bear against any side-channels that remain.

For the purposes of this post, we consider the decrypted, CBC record to contain the following parts:

  1. n bytes of plaintext. (Which may be compressed but, within our scope, it's plaintext.)
  2. mac_size bytes of MAC. (Up to 48 bytes in TLS with SHA-384. We conservatively assume that the minimum number of MAC bytes is zero since this is clearly a valid lower-bound and accounts for truncated MAC extensions.)
  3. padding_length bytes of padding.
  4. One byte of padding length. (Some consider this to be part of the padding itself, but I'll try to be consistent that this length byte is separate.)

With this secret data in hand, we need to perform three steps in constant time:

  1. Verify that the padding bytes are correct: the padding cannot be longer than the record and, with SSLv3, must be minimal. In TLS, the padding bytes must all have the same value as the length byte.
  2. Calculate the MAC of the header and plaintext (where the header is what I'm calling the additional data that is also covered by the MAC: the sequence number etc).
  3. Extract the MAC from the record (with constant memory accesses!) and compare it against the calculated MAC.

We will make extensive use of bitwise operations and mask variables: variables where the bits are either all one or all zero. In order to create mask values it's very useful to have a method of replicating the most-significant-bit to all the bits in a value. We'll assume that an arithmetic right shift will shift in the MSB and perform this operation for us:

#define DUPLICATE_MSB_TO_ALL(x) ( (unsigned)( (int)(x) >> (sizeof(int)*8-1) ) )
#define DUPLICATE_MSB_TO_ALL_8(x) ( (uint8_t)( (int8_t)(x) >> 7) )

However, note that the C standard does not guarantee this behaviour, although all CPUs that I've encountered do so. If you're worried then you can implement this as a series of logical shifts and ORs.

We'll define some more utility functions as we need them.

SSLv3 padding

SSLv3 padding checks are reasonably simple: we need only to test that the padding isn't longer than the record and that it's less than a whole cipher block. We assume that the record is at least a byte long. (Note that the length of the whole record is public information and so we can fast reject invalid records if we're using that length alone.) We can use the fact that, if a and b are bounded such that the MSB cannot be set, then the MSB of a - b is one iff b>a.

padding_length = data[length-1];
unsigned t = (length-1)-padding_length;
unsigned good = DUPLICATE_MSB_TO_ALL(~t);
t = block_size - (padding_length+1);

The resulting value of good is a mask value which is all ones iff the padding is valid.

TLS padding

Padding in TLS differs in two respects: the value of the padding bytes is defined and must be checked, and the padding is not required to be minimal. Therefore we must always make memory accesses as if the padding was the maximum length, with an exception only in the case that the record is shorter than the maximum. (Again, the total record length is public information so we don't leak anything by using it.)

Here we use a mask variable (mask), which is all ones for those bytes which should be part of the padding, to discard the result of checking the other bytes. However, our memory access pattern is the same for all padding lengths.

unsigned padding_length = data[length-1];
unsigned good = (length-1)-padding_length;
good = DUPLICATE_MSB_TO_ALL(~good);
unsigned to_check = 255; /* maximum amount of padding. */
if (to_check > length-1) {
        to_check = length-1;

for (unsigned i = 0; i < to_check; i++) {
        unsigned t = padding_length - i;
        uint8_t mask = DUPLICATE_MSB_TO_ALL(~t);
        uint8_t b = data[length-1-i];
        good &= ~(mask&(padding_length ^ b));

If any of the padding bytes had an incorrect value, or the padding length was invalid itself, one of more of the bottom eight bits of good will be zero. Now we can map any value of good with such a zero bit to 0, and leave the case of all ones the same:

good &= good >> 4;
good &= good >> 2;
good &= good >> 1;
good <<= sizeof(good)*8-1;
good = DUPLICATE_MSB_TO_ALL(good);

In a very similar way, good can be updated to check that there are enough bytes for a MAC once the padding has been removed and then be used as a mask to subtract the given amount of padding from the length of the record. Otherwise, in subsequent sections, the code can end up under-running the record's buffer.

Calculating the MAC

All the hash functions that we're concerned with are Merkle–Damgård hashes and nearly all used in TLS have a 64-byte block size (MD5, SHA1, SHA-256). Some review of this family of hash functions is called for:

All these hash functions have an initial state, a transform function that takes exactly 64 bytes of data and updates that state, and a final_raw function that marshals the internal state as a series of bytes. In order to hash a message, it must be a multiple of 64 bytes long and that's achieved by padding. (This is a completely different padding than discussed above!)

In order to pad a message for hashing, a 0x80 byte is appended, followed by the minimum number of zero bytes needed to make the length of the message congruent to 56 mod 64. Then the length of the message (not including the padding) is written as a 64-bit, big-endian number to round out the 64 bytes. If the length of the message is 0 or ≥ 55 mod 64 then the padding will cause an extra block to be processed.

In TLS, the number of hash blocks needed to compute a message can vary by up to six, depending on the length of the padding and MAC. Thus we can process all but the last six blocks normally because the (secret) padding length cannot affect them. However, the last six blocks need to be handled carefully.

The high-level idea is that we generate the contents of each of the final hash blocks in constant time and hash each of them. For each block we serialize the hash and copy it with a mask so that only the correct hash value is copied out, but the amount of computation is constant.

SHA-384 is the odd one out: it has a 128 byte block size and the serialised length is twice as long, although it is otherwise similar. The example below assumes a 64 byte block size for clarity.

image/svg+xml 0x80 0x00 ... application data 0x00 ... length bytes index_a index_b c

We calculate two indexes: index_a and index_b. index_a is the hash block where the application data ends and the 0x80 byte is put. index_b is the final hash block where the length bytes are put. They may be the same, or index_b may be one greater. We also calculate c: the offset of the 0x80 byte in the index_a block.

The arithmetic takes some thought. One wrinkle so far ignored is that additional data is hashed in prior to the application data. TLS uses HMAC, which includes a block containing the masked key and then there are 13 bytes of sequence number, record type etc. SSLv3 has more than a block's worth of padding data, MAC key, sequence etc. These are collectively called the header here. The variable k in the following tracks the starting position of the data for the constant time blocks and includes this header.

for (i = num_starting_blocks; i <= num_starting_blocks+varience_blocks; i++) {
        uint8_t block[64];
        uint8_t is_block_a = constant_time_eq(i, index_a);
        uint8_t is_block_b = constant_time_eq(i, index_b);
        for (j = 0; j < 64; j++) {
                uint8_t b = 0;
                if (k < header_length) {
                        b = header[k];
                } else if (k < data_plus_mac_plus_padding_size + header_length) {
                        b = data[k-header_length];

                uint8_t is_past_c = is_block_a & constant_time_ge(j, c);
                uint8_t is_past_cp1 = is_block_a & constant_time_ge(j, c+1);
                /* If this is the block containing the end of the
                 * application data, and we are at, or past, the offset
                 * for the 0x80 value, then overwrite b with 0x80. */
                b = (b&~is_past_c) | (0x80&is_past_c);
                /* If this the the block containing the end of the
                 * application data and we're past the 0x80 value then
                 * just write zero. */
                b = b&~is_past_cp1;
                /* If this is index_b (the final block), but not
                 * index_a (the end of the data), then the 64-bit
                 * length didn't fit into index_a and we're having to
                 * add an extra block of zeros. */
                b &= ~is_block_b | is_block_a;

                /* The final eight bytes of one of the blocks contains the length. */
                if (j >= 56) {
                        /* If this is index_b, write a length byte. */
                        b = (b&~is_block_b) | (is_block_b&length_bytes[j-56]);
                block[j] = b;

        md_transform(md_state, block);
        md_final_raw(md_state, block);
        /* If this is index_b, copy the hash value to |mac_out|. */
        for (j = 0; j < md_size; j++) {
                mac_out[j] |= block[j]&is_block_b;

Finally, the hash value needs to be used to finish up either the HMAC or SSLv3 computations. Since this involves processing an amount of data that depends only on the MAC length, and the MAC length is public, this can be done in the standard manner.

In the case of SSLv3, since the padding has to be minimal, only the last couple of hash blocks can vary and so need to be processed in constant time. Another optimisation is to reduce the number of hash blocks processed in constant time if the (public) record length is small enough that some cannot exist for any padding value.

The above code is a problem to implement because it uses a different hash API than the one usually provided. It needs a final_raw function that doesn't append the hash padding, rather than the common final function, which does. Because of this implementers may be tempted to bodge this part of the code and perhaps the resulting timing side-channel will be too small to exploit, even with more accurate local attacks. Or perhaps it'll be good for another paper!

Extracting the MAC from the record.

We can't just copy the MAC from the record because its position depends on the (secret) amount of padding. The obvious, constant-time method of copying out the MAC is rather slow so my thanks to Emilia Kasper and Bodo Möller for suggesting better ways of doing it.

We can read every location where the MAC might be found and copy to a MAC-sized buffer. However, the MAC may be byte-wise rotated by this copy so we need to do another mac_size2 operations to rotate it in constant-time (since the amount of rotation is also secret):

unsigned mac_end = length;
unsigned mac_start = mac_end - md_size;
/* scan_start contains the number of bytes that we can ignore because the
 * MAC's position can only vary by 255 bytes. */
unsigned scan_start = 0;
if (public_length_inc_padding > md_size + 255 + 1) {
        scan_start = public_length_inc_padding - (md_size + 255 + 1);
unsigned char rotate_offset = (mac_start - scan_start) % md_size;

memset(rotated_mac, 0, sizeof(rotated_mac));
for (unsigned i = scan_start; i < public_length_inc_padding; i += md_size) {
        for (unsigned j = 0; j < md_size; j++) {
                unsigned char mac_started = constant_time_ge(i + j, mac_start);
                unsigned char mac_ended = constant_time_ge(i + j, mac_end);
                unsigned char b = 0;
                if (i + j < public_length_inc_padding) {
                        b = data[i + j];
                rotated_mac[j] |= b & mac_started & ~mac_ended;

/* Now rotate the MAC */
memset(out, 0, md_size);
for (unsigned i = 0; i < md_size; i++) {
        unsigned char offset = (md_size - rotate_offset + i) % md_size;
        for (j = 0; j < md_size; j++) {
                out[j] |= rotated_mac[i] & constant_time_eq_8(j, offset);

Finally, the MAC should be compared in constant time by ORing a value with the XOR of each byte in place from the calculated and given MACs. If the result is zero, then the record is valid. In order to avoid leaking timing information, the mask value from the padding check at the very beginning should be carried throughout the process and ANDed with the final value from the MAC compare.

Limitations of our model

We started out by defining constant-time with a specific model. But this model fails in a couple of ways:

Firstly, extracting the MAC as detailed above is rather expensive. On some chips all memory accesses within a cache-line are constant time and so we can relax our model a little and use a line-aligned buffer to perform the rotation in when we detect that we're building on such a platform.

Secondly, when I sent an early version of the OpenSSL patch to the paper's authors for them to test, they reported that they found a timing difference! I instrumented the OpenSSL code with the CPU cycle counter and measured a median time of 18020 cycles to reject a record with a large amount of padding but only 18004 cycles with small padding.

This difference may be small, but it was stable. To cut a long story short, it came from this line:

unsigned char rotate_offset = (mac_start - scan_start) % md_size;

When the padding was larger, the MAC started earlier in the record (because the total size was the same). So the argument to the DIV instruction was smaller and DIV, on Intel, takes a variable amount of time depending on its arguments!

The solution was to add a large multiple of md_size to make the DIV always take the full amount of time (and to do it in such a way that the compiler, hopefully, won't be able to optimise it away). Certainly 20 CPU cycles is probably too small to exploit, but since we've already “solved” this problem twice, I'd rather not have to worry in the future.

Real World Crypto 2013

(These are my notes for a talk that I gave last week at Real World Crypto. The premise of the conference is that it brings together theoretical cryptographers and practitioners. So this talk is aimed at theoretical cryptographers but it's fairly simple as I don't have anything complex worth saying to real cryptographers! Slides for other talks are linked from the program page and Rogaway's are relevant to this talk.

Note that this isn't a transcript: I actually say more words than this, but it contains the main points.)

Hello all.

For those who don't know me, I'm Adam Langley. I work at Google, mostly on our serving side HTTPS infrastructure these days. But I also still do some work on Chrome's SSL stack from time to time.

When I was asked to come up with a title of a talk I picked “Things that bit us, things we fixed and things that are waiting in the grass” with reference to HTTPS. Partly because that's what I know about but also because HTTPS is the only place where most people knowingly interact with public crypto, so it's one of the few examples of real world crypto at really large, really messy scales.

I could also have titled this talk “Know your enemy” because your real enemy is not trying to achieve an advantage with greater than negligible probability. As I hope to convince you, your enemy is me. I am the idiot who will mess up the implementation of your lovely cryptosystem. Your 128-bit security level is worthless in the face of my stupidity and so I'm here to seek your help in making my life easier and, therefore, everything more secure.

But before we get into anything crypto related, I'd like to set the scene. By and large, transport security on the Internet is doing OK because few people bother to attack it. If you want to steal banking credentials, you can get malware toolkits that will work on a large enough fraction of machines to be useful - e.g. using the Java exploit made public this week. If you want to steal passwords, you can SQL inject the site and reverse the hashes (if the site even bothered to hash), or just use the Ruby on Rails exploit made public this week. Given the level of password reuse, it doesn't even have to be the same site! Economically, attacking the transport doesn't make sense for many attackers and so they don't do it.

If you do want to attack the transport, by far the best methods are SSL stripping, mixed scripting and insecure cookie vulnerabilities.

SSL stripping means that, when the user types in example.com, since the default scheme is unencrypted, the browser makes an insecure request. The attacker can simply answer that request and proxy the entire site while removing any attempts to upgrade to HTTPS. In the majority of cases, the user will never notice.

Mixed scripting results from pages that source Javascript, CSS or plugins from an HTTP URL. The request is made insecurely but the response is trusted with the same authority as the HTTPS origin that requested them. On sites that serve both HTTP and HTTPS, mixed scripting is endemic.

Lastly, cookies are secrets that the client sends to authenticate requests. If, when creating a cookie, the server doesn't set the secure flag, the client will also send the same cookie over unencrypted connections. Since forgetting to set that flag doesn't break anything in normal operation, it happens fairly often.

Oh, and all that is assuming that any transport security is used at all. HTTP is still much more common than HTTPS.

I've breezed over those issues, but the important point is that none of them involve crypto, they're all systems issues. On the whole, the crypto is doing great in comparison to everything else!

I'll cycle back to those issues towards the end but I wanted to give some perspective as I'll be talking about the crypto a lot, since this is Real World Crypto, but crypto is only a small part of web security.

When writing this talk I sat down and made a list of the cryptographic issues that have bitten HTTPS over the past few years and tried to group in order to make points that I think would be useful for the research community. I didn't end up using them all, so this is hardly a exhaustive list and I used some things twice, but here they are:

Go beyond a paper

  1. CRIME (compression leaks.)
  2. BEAST (CBC vs adaptive, chosen plaintext.)

The first group of issues I called ‘go beyond a paper’:

The CRIME attack resulted from the attacker having partial, chosen plaintext abilities on a channel which was performing compression. This applied to both TLS, which supports compression, and SPDY, an HTTP replacement that runs over TLS that applied compression internally. Most, major HTTPS sites don't support TLS compression so SPDY was the bigger hole, but SPDY is a relatively new protocol and the idea that compression can leak information isn't. I can easily find it going back ten years (“Compression and Information Leakage of Plaintext”, Kelsey, 2002) and here's what I said on the SPDY mailing list before the CRIME work:

With a good model of zlib, I think you could extract a ~40 byte cookie with ~13K requests. That's a practical attack and would make a great paper if someone has the time.

Of course, I never had the time but we did start working on a better compression scheme to solve the issue.

But SPDY was done by smart networking folks. Since the compression leak issues weren't known widely enough at the time, they picked gzip as a seemingly reasonable compression algorithm. It wasn't clearly stupid to compose TLS and gzip and yet it blew up when they did so. Had the Kelsey paper instead been a splashy, public demonstration, as CRIME was, then it's possible that the idea would have sunk into the collective memory to the point where simply using gzip wouldn't have seemed so reasonable. As it is, we're now using a horrible gzip hack to segment cookies and other sensitive headers from the attacker controlled values, although the replacement compression is mostly done, pending SPDY 4.

So the first idea in this section is that it's OK to make a splash in order to get some new idea into the minds of the developer community in general.

Somewhat similarly, the origins of the BEAST attack dates back to at least 2002 in the context of SSH (Möller, 2002).

In this case, Rizzo and Duong contacted the major SSL stacks prior to their work going public with a clear case that it was worth looking into. This, at least from my point of view, is very welcome! Please do this if you can. You can contact me if nothing else and I can rope in all the other usual suspects. Please set a clear go-public date and allow us to share the facts of the matter with other major vendors.

In the case of BEAST, this produced a rare example of major browsers breaking things together for the common good. The breakage wasn't trivial: we took out Disneyland Tokyo's online ticking system amongst many others, but we got it deployed.

So the second point of this group is that you should consider treating it like a security disclosure if it's warranted. We don't bite!

The world often sucks

  1. Hash collisions in MD5
  2. BEAST
  3. Downgrade attacks

The second group I put together under the title ‘the world often sucks’.

I'm afraid that sometimes it takes a while to make changes even if we are given a very clear demonstration of the issue. After a very clear demo of MD5 collisions causing vulnerabilities in certificate issuance in 2008 it still took years to remove support for it. Other remedial actions were taken: public CAs were required to use random serial numbers and stopped using MD5 for new certificates. But it wasn't until early 2012 that Chrome removed support for MD5 in signatures. Sadly, many MITM proxies still used MD5 and, despite giving them lots of notice, they didn't do anything and we broke quite a few people with that change.

Of course, later in the year it was found that Flame broke Microsoft's code signing with an MD5 collision. Maybe that'll be enough to convince people to move.

But the point of this section is to encourage people to think about workarounds because the ‘right fix’ often isn't feasible. For hash collisions we have randomised serial numbers and although we might like everyone to be using SHA-256 or SHA-3, realistically those randomised serial numbers are going to be buttressing SHA-1 for quite some time to come.

Marc Stevens talked earlier about how to detect SHA-1 collisions given only one of the colliding messages. That's great! That's absolutely something that we can put into certificate validation now and hopefully prevent problems in the future.

In relation to BEAST: I mentioned that the core weakness had been known since 2002 and, because of that, there was even a workaround in OpenSSL for it: empty fragments. By sending an empty CBC fragment before each real record, the MAC would effectively randomise the IV. However, empty fragments caused compatibility issues because some SSL stacks returned a zero length when encountering them and higher layers of the code took this to be EOF. Because of that, the workaround was never widely enabled.

In the course of discussing BEAST another workaround was proposed: 1/n-1 record splitting. Rather than putting an empty fragment before each record, include a single byte of the plaintext. The protection isn't quite as good, but it solves the EOF problem. Some servers still broke because they assumed that the complete HTTP request would come in a single read, but the lower rate of problems probably made BEAST mitigation viable.

Lastly, there's the first of our snakes in the grass (problems that are still pending to bite us): SSLv3 downgrade. Since there exist so many broken servers and network middleboxes on the Internet that can't handle TLS version negotiation, in the event of a TLS handshake failure browsers will fall back to SSLv3. However, SSLv3 doesn't support all of the features that TLS does. Most significantly from my point of view, it doesn't support ECDHE, which Google servers use. So a network attacker can trigger a fallback and downgrade a capable client to a non-forward secure ciphersuite.

This is obviously not good. The correct fix is to remove the fallback to SSLv3 of course, but that's sadly not viable right now. Instead, as a workaround, Yngve (formerly of Opera) suggested using the TLS renegotiation extension as a signal that a server is reasonably recent and therefore we shouldn't have performed the fallback.

Numbers from Chrome indicate that a high fraction of the servers that we perform fallback for are renego patched, suggesting that's a bad signal and we should instead create a different one. Although maybe the number of fallbacks is dominated by transient network problems and that's skewing the data. Eric Rescorla has suggested replicating the TLS version negotiation using ciphersuite values. It's something that we will hopefully address in one way or another in 2013.

So that's the point of this group: please consider workarounds because we can't always manage to deploy the ‘right’ fix.

Side-channels are a big deal.

  1. RSA PKCS#1 v1.5 padding oracles ("Million message attack")
  2. CBC padding oracles (Vaudenay's attack)
  3. Timing attacks against RSA CRT.
  4. Side-channel attacks against AES
  5. Side-channel attacks against group operations

This big group is about the fact that side channels are a big deal. Writing constant-time code is a very odd corner of programming and not something that can be easily tested. I have a valgrind hack called ctgrind that allows for some automated testing, but it certainly has its limitations.

But what does constant-time even mean? There's the obvious definition: the CPU runs for the same amount of time, independent of any secret inputs. But CPUs are preemptively multitasked and have frequency scaling and thermal limiting these days so that's not a very useful definition in practice. A more workable definition is that if the code were to be run on an abstract Von Neumann machine, then the trace of memory fetches is identical for all secret inputs. That means that the trace of instructions fetched and data accesses is constant for all secret inputs. That takes care of all the obvious problems.

In practice, it can be useful to relax the definition a little and require only that the set of cache lines accessed for data is constant, rather than the exact addresses. In practice CPUs often fetch whole cache lines at a time and using that fact can lead to speedups at the cost of having to know the cache line length of the CPU.

This model assumes that the individual instructions themselves are constant time. As a research topic it would be interesting to know how variable time CPU instructions affect this. For example, from the Intel optimisation manual:

The latency and throughput of IDIV in Enhanced Intel Core micro-architecture varies with operand sizes and with the number of significant digits of the quotient of the division.

Is this a problem, cryptographically? I don't know. I think multiplies on ARM are variable time too. They are on PowerPC, but that's a much more obscure platform.

As a researcher, what's there to keep in mind with respect to constant-time implementations? Firstly, know that Moore's law will never help you. CPUs may get faster, but the amount of data that we need to process is increasing just as fast, if not faster. So you can't assume that the slower, constant time code will become viable in time - you have to be constant time from the start.

Even having a non-constant time implementation is a danger. There are usually two sides to a protocol and I may not control the software on the other side. If I specify AES in a protocol then I have to consider that it may well be a non-constant time implementation. I just made up the term ‘implementation ecosystem’ for this. AES is very difficult to implement securely in software: good implementations are still still topics for research papers. So the implementation ecosystem for AES is terrible! There are lots and lots of side-channel vulnerable implementations out there because no normal person, given the AES spec, will produce a secure implementation.

If we're aiming for a 128-bit security level then that possibility is a much larger problem than many other, more traditional crypto concerns.

So, for new primitives, you may want to produce solid implementations for different platforms to seed the implementation ecosystem. Not just reference implementations, but ones that are good enough that they dominate the set of implementations. For example, if I specify curve25519 in a protocol, I can be pretty sure that everyone is going to be using djb's reference code. That's a major advantage.

You should consider what an implementation is going to look like when designing. Of course, building a solid implementation will make sure that you end up considering this, so that's another good reason. There are certain patterns that are inherently dangerous. Square-and-multiply loops for example. You should recognise that and make sure that even the description of the algorithm includes counter measures. Binary fields are another which are very likely to result in non-constant time code.

Lastly, please don't assume that CPU changes are going to solve your constant-time or performance problems. Intel have added specific instructions for AES and binary fields in their latest chips and, while that does have some benefit, they will be a small fraction of all chips for a very long time. The chance of both sides of a connection having these chips is even smaller.

Cryptographic Room 101.

Room 101 is a fairly long running British TV show where people nominate things they dislike to put into Room 101, which banishes them from the world. Based on the above I've a number of things that I'd to put into Room 101, from least controversial to most. These are more specific points than the general ones that I just made and I'd like to see if anyone in the room disagrees with me!

1. MAC then Encrypt.

I hope that everyone agrees on this. Of course, it's pretty ubiquitous in the world today. Just because I want to banish something doesn't mean that I'm not going to spend years dealing with it in the future!

2. CBC mode.

With all the problems of chosen plaintexts, padding oracles etc I think it's time to get rid of CBC mode forever. Although it's possible to implement it securely, it's been done wrong for so long and is so easy to mess up, I think it's best to get rid of it.

3. ‘Sudden death’ entropy failure: plain DSA.

DSA (and ECDSA) has a very unfortunate property that an entropy failure leaks the private key. Even a slight bias over many signatures can be exploited. This is ridiculous. As Nadia and Debian have demonstrated, entropy failures do happen and they are inherently very bad. But that's not a reason to amplify them! By hashing in the private key and message, this problem can be avoided. So please consider what happens when your nonces are actually ntwices.


I've saved the really controversial one for last!

AES-GCM so easily leads to timing side-channels that I'd like to put it into Room 101. It took a decade of research to produce solid, high-speed, constant time AES implementations and they are very complex. In that decade, many, many non-constant time AES implementations have found their way into everything, poisoning the ecosystem when it comes to using AES.

I haven't seen any research on extracting the key from GHASH, but I've certainly seen vulnerable implementations and there's every reason to believe that it's possible. Most GHASH implementations look like AES implementations from 10 years ago. I'm aware of one, reasonable, constant-time AES-GCM implementation (Käsper and Schwabe, CHES 2009), but it runs at 22 cycles/byte on a Core2.

If you have a recent Intel chip, and software that implements AES-GCM using the specific instructions provided, then it's great. But most chips don't have that and I think it would have been much more preferable to pick an AEAD that's easy to securely implement everywhere and then speed it up on certain chips.

But it's still much better than AES-CBC I guess!

(At this point I cycled back to talking about the larger transport security issues and how browsers are addressing them with HSTS, mixed script blocking etc. But I've written about that before.)

Certificate Transparency

These are my notes for a talk that I gave today at IETF 85. For more details, see the draft.

Certificates are public statements that everyone trusts, but they aren't public record. Other critical details about companies are generally public record, their address, directors etc, but not their public key. Why not? Work like the EFF Observatory has already means that CA customer lists, which they might consider confidential, are public.

If certificates were public record then you would be able to see what CAs are asserting about you, and hopefully correct any mistakes. I would very much like to know the set of valid certificates within google.com at any given time. I know about the real ones, of course, but I don’t know that there aren’t others. At the moment, there's no accountability for CAs unless you get really, publicly caught screwing up.

However, there is a class of certificates that we don't want published: internal names. Some companies get certificates from a real CA for their internal networks and those names should not be published. So we'll keep that in mind as we work out a design.

How might we achieve this? Well, we could have CAs publish the certificates that they issue. That's a good start; and actually useful to some degree, but you'll probably admit that this is weaker than we might like, so how do we make it stronger?

So that's our design goal, now let's talk about the constraints:

Firstly, we aren't going to update every HTTPS server on the planet. Any design must be incrementally deployable and, initially, we are talking about HTTPS. Hopefully any design would be generally applicable, but HTTPS is what we have in mind right now.

Secondly, no blocking lookups. Making calls to a third-party during certificate validation is very convenient, but it's just too costly. It creates latency when it's working and a disaster when it's not. Too many networks filter too much for it to be dependable and we've never managed to make it work for OCSP, so there's little chance that it'll work for this.

So whatever information we need to get to the client has to be in the handshake, and anything that client needs to report back has to be asynchronous.

We had our straw-man proposal where we just ask CAs to publish their certificate stream, but we want to do better because that only works when the CA is simply mistaken. So the problem that we need to address is how do clients know that a certificate they receive has been published at all?

Well, we can have an independent certificate log sign a statement of publication and, based on our constraints, we have to put that signature somewhere in the TLS handshake. We call the statements of publication Signed Certificate Timestamps and, since we don't want to delay certificate issuance, they’re not actually statements of publications, rather they are a promise that the certificate will be published soon.

Now we've just moved the problem from "what if the CA doesn't publish it?" to "what if the log promises to publish something, but doesn't?". We can require SCTs from a quorum of independent logs, and we'll do that, but running an append only log is a job that can be verified by clients.

In order to make a log verifiable, we put the log entries in a Merkle tree, with certificates at the leaves. Each log is an ever growing Merkle tree of certificates and the log periodically signs the root of the tree and publishes it, along with all the entries. But what does `periodically' mean? All logs must publish certificates that they have received within a time called the Maximum Merge Delay. That merge delay is the amount of time that a certificate can be used without being published.

We detect violations of the MMD by having clients verify the log's behaviour. Clients can request from the logs (or a mirror), a path from any certificate to the root of a published Merkle tree. In order to preserve client privacy this request may be made via DNS, if possible, or by requesting paths for a range of certificates rather than a specific one.

Once a client has a signed root, it needs to gossip about it. Clients implicitly trust their software provider so one common answer may be to request a signed statement that the log root has been observed from an auditor service run by them. But these auditors can be run by anyone, and clients can configure their auditor however they like.

Since these client checks are asynchronous, they can be blocked by an attacker. Clients will have to be tolerant to that to some extent because many networks are indistinguishable to an attack. However, if after some amount of time, the client has a certificate that it hasn't been able to check, or a log root that it hasn't verified with its auditor, it should try to publish it in various, unspecified ways. In the extreme, one can imagine asking the user to email a file to an email address.

So to `break' this system, by which I mean to get a certificate that will be silently accepted by clients without the world knowing about it, one needs to compromise a CA, a quorum of logs, and then either partition the client forever, or also compromise the client's auditor.

In addition to auditors, there’s another class of log observers: monitors. Monitors watch the logs for interesting events. We envision at least one, obvious, type of monitor service: an `alerts’ system where anyone can sign-up to receive emails when a certificate is issued in a certain domain.

So now some practical points:

First, how do we get the SCTs (the receipts from the logs) into the TLS handshake without altering every HTTPS server? Well, the one thing that every HTTPS server can do is serve a certificate chain. So we tried a couple of tricks:

One, we tried adding a superfluous certificate to the end of the chain. Most clients will ignore it, but a few smaller ones, and old versions of Android, don't and IIS doesn't like configuring such a chain. So we aren't pushing ahead with that idea.

We also tried putting SCTs into the unsigned portion of the leaf certificate and that works fairly well except for breaking Java. Nonetheless, we may continue to support that as an option.

And options are the way we're going to go with this problem: there doesn't seem to be a good single answer.

So another option is that CAs can do the work for you and put the SCTs into the signed part of the certificate, in an X.509 extension. Of course, the SCT contains a hash of the certificate, and a hash cannot include itself, so CAs issue a `pre-cert' from a special intermediate with a magic EKU that makes it invalid for normal use. The pre-cert can be submitted to the logs to get SCTs for embedding in the real certificate.

Another way in which CAs can do it for you is to put the SCTs in an OCSP response and then one uses OCSP stapling on the server.

Finally, for servers that can be updated, SCTs can be included in an Authorisation Data TLS-extension, and support for that is working its way through OpenSSL and Apache.

We're experimenting with CAs at the moment to find what works and we may whittle down this list in time. But for now we're promiscuous about solutions to this problem.

The logs also have to worry about spam, so they only accept certificate chains that end at a known root. There's a very low bar for the logs to accept a root because it doesn't grant any authority, it's just an anti-spam measure, but we do have to make sure that the logs can’t be spammed to death.

Going back to one of our requirements at the beginning: we don't want to log internal names in certificates. One solution to this is to allow intermediates with name constraints to be logged and then any certificates from there don't need to be. So an intermediate constrained to issue within example.com can be logged and then we don't need to log any leaf certificates within example.com: they can't be used against anyone else and, by using an intermediate, its security is up to you.

However, that's a little complex so the initial solution will probably be that we let companies configure the clients to say "don't require logging within these domains". At least for Chrome, people report great success with our Enterprise Policy configuration and that's an easy route for such config.

Lastly, the very tricky question: deployment.

Initially we can deploy in the same manner as pinning and HSTS: require it only for certain, opt-in domains (or possibly for opt-in CAs). But the goal is very much to get it everywhere. That will take years, but we do plan to do it. We need to make the error messages in the browser informative for server admins as well as users because this will be something new for them, although hopefully their CA will just take care of it. We can also enforce it for certificates issued past a certain date, rather than having a flag day. That way the certificate change will be the triggering factor, which is something under the control of the site. Lastly, we'll probably have to do the work to patch software like EJBCA.

We do not yet know exactly who will be running the logs, nor how many we want. But Google expects to run one and that project is staffed. We’re also working with Digicert and Comodo who are experimenting with CT integration and are considering running logs.

Since we can verify the operation of logs, they don't have the same trusted status as CAs. However, the set of logs has to be globally agreed so having to revoke a log would be a pain, so we do want logs that are operationally competent.

There is no doubt that this will be a difficult and lengthy deployment process. But it doesn't seem completely hopeless.

NIST may not have you in mind

A couple of weeks back, NIST announced that Keccak would be SHA-3. Keccak has somewhat disappointing software performance but is a gift to hardware implementations.

This is rather a theme for NIST selected algorithms. AES includes a number of operations on a binary field and CPUs generally don't implement binary field operations because they aren't really useful for anything else. So many software implementations of AES include four, 1024-byte tables of results in that field in order to achieve reasonable performance. However, these table lookups cause a timing side-channel which is fairly effective even over a network. On the same machine, where the CPU cache state can be probed more directly, it's very effective.

In response to this, some AES implementations made their tables smaller to reduce the leak. That's obviously an improvement, but it came at the cost of slowing AES down and AES was never very quick to start off with. And while a smaller leak is better, AES is supposed to be operating at (at least) an 128-bit security level!

Next, the mode recommended for AES by NIST is GCM: it's AES in counter mode coupled with a polynomial authenticator. Cryptographically this is perfectly sensible but the polynomial authenticator uses binary fields again. For a hardware implementation, this is another gift but software implementations need, yet again, tables in memory to implement it at reasonable speed.

Tables not only cause problems with timing side-channels, they also cause slowdowns that don't appear in micro-benchmarks. Tables are great when the benchmark is running with a single key and they are contained in L1 cache. But when running in a larger system, the cache pressure causes slowdowns and the real-world speed of an such an algorithm can be dramatically less than the benchmarks suggest.

But there's no reason why a polynomial authenticator needs to use binary fields. Why not use a field that CPUs actually implement? If you do so you get a very fast authenticator with dramatically improved key agility.

So why does NIST continually pick algorithms that aren't suited to software implementations? One of my colleagues proffers (somewhat jokingly) a conspiracy theory but I suspect that the answer is much more mundane. At USENIX Security this year, Dickie George gave an excellent keynote involving some NSA history and described how they would spend a decade or more designing a cryptographic system from the chip upwards. Everything was custom built, carefully designed and it was a completely closed system. It's certainly nice to be able to do that!

If you're designing such a system, of course you would be orientated towards hardware. Every implementation of your system would be a custom-designed chip; there's no benefit to making something easy to implement on commodity systems. And, in that world, AES, GCM, and SHA3 make a lot of sense. As the GCM paper says, “binary Galois field multiplication is especially suitable for hardware implementations”. I'm assuming that there's a fair amount of shared worldview between NIST and the NSA, but that doesn't seem unreasonable.

(In a NIST paper about the selection of AES they state: “table lookup: not vulnerable to timing attacks.” Oops. If you're building your own hardware, maybe.)

I'm not suggesting that the NSA is actually using AES, GCM etc in their own designs. But a generation or more of cryptographers from that world are probably comfortable with features that were chosen with hardware in mind. Those features, like binary fields, are what they will naturally gravitate towards.

People who aren't in the situation of designing custom chips, but rather have to design protocols for implementation on a menagerie of commodity chips, might want to consider that NIST doesn't appear to have them in mind. Although hardware implementations of AES have existed for a while, unless they are CPU instructions (like AES-NI), then the system-call overhead can cause the acceleration to be slower than the plain CPU for common, 1400-byte, chunk sizes. Even when you have a CPU instruction to implement AES and GCM, it's unlikely that all the chips that your code is going to run on will be so endowed.

Hardware implementations of these, poorly-suited primitives also take chip area and development time away from other tasks. Why not choose primitives that use CPU abilities that lots of code benefits from? Then optimisations of those CPU instructions lifts all boats.

So if you want to pick algorithms designed for CPUs, try Salsa20 rather than AES and Poly1305 rather than GCM. But we don't have a great answer for a modern hash function I'm afraid, and that's a little disappointing. SHA-2 is fine for now, although Zooko has some thoughts on the matter.

(I should note that, thanks to some great work by djb, Peter Schwabe, Emilia Käsper and others, we do have bitsliced, constant-time implementations of AES now that run reasonably fast (for AES). But it's only effective for counter mode, and I believe that it needs long streams of data to achieve its headline speed. (A commenter pointed out that it can be made to work with only 8 blocks. Later, another pointed out that that I'm doubly wrong! We can do it for single blocks in reasonable time.) That's great, but it took 10 years of optimisation work and implies that AES can only be effectively implemented by some of the best practical cryptographers in the world.)

DANE stapled certificates

We've supported DNSSEC stapled certificates in Chrome for a while now and since the DANE RFC has been published, it's about time that I updated that code to support DANE, rather than the hacked up CAA record that it currently uses.

I've written the DANE code, but it also seemed like an opportune time to reevaluate the idea. The promise of DNSSEC, as Dan Kaminsky put it, it that it can reduce the number of meetings required to get something done.

You have a whole bunch of internal hosts that some company that you're partnering with needs to interface with securely? Well, perhaps you used an internal CA, so will they install that root on their systems? (Meeting.) They will if it's name constrained, can we get it reissued with name constraints? (Meeting.) Ok, now can IT get that root pushed out everywhere? (Meeting.) You see how this is going.

But every bit of code needs to justify its complexity and, for small cases, StartSSL will give you free certificates. Perhaps just Chrome doing it isn't very interesting, and everyone hacks up custom solutions anyway.

So, if it's useful to you, you should let me know (agl at chromium dot org).


Last year I happened to worry on the SPDY mailing list about whether sensitive information could be obtained via SPDY's use of zlib for compressing headers. Sadly, I never got the time to follow up and find out whether it was a viable attack. Thankfully there exist security researchers who, independently, wondered the same thing and did the work for me! Today Duong and Rizzo presented that work at ekoparty 2012.

They were also kind enough to let Firefox and ourselves know ahead of time so that we could develop and push security fixes before the public presentation. In order to explain what we did, let's start by looking at how SPDY compressed headers:

(This is inline SVG, if you can't see it, check here.)

? ? ? ? ? ? ? ? : h o s t ? ? ? ? w w w . g o o g l e . c o m ? ? ? ? : m e t h o d ? ? ? ? G E T ? ? ? ? : p a t h ? ? ? ? / ? ? ? ? : s c h e m e ? ? ? ? h t t p s ? ? ? ? : v e r s i o n ? ? ? ? H T T P / 1 . 1 ? ? ? ? a c c e p t ? ? ? ? t e x t / h t m l , a p p l i c a t i o n / x h t m l + x m l , a p p l i c a t i o n / x m l ; q = 0 . 9 , * / * ; q = 0 . 8 ? ? ? ? a c c e p t - c h a r s e t ? ? ? ? I S O - 8 8 5 9 - 1 , u t f - 8 ; q = 0 . 7 , * ; q = 0 . 3 ? ? ? ? a c c e p t - e n c o d i n g ? ? ? ? g z i p , d e f l a t e , s d c h ? ? ? ? a c c e p t - l a n g u a g e ? ? ? ? e n - U S , e n ; q = 0 . 8 ? ? ? ? c o o k i e ? ? ? ? P R E F = I D = 7 0 c b b f e 7 e e 8 8 a 5 e 2 : F F = 0 : T M = 1 3 4 7 4 0 0 4 7 9 : L M = 1 7 8 7 4 1 4 3 9 5 : S = f J q m r k s _ 8 G h 3 i 1 K D ; N I D = 6 3 = k 9 f K d q _ 2 g u F 0 O 1 F 5 g V 6 K 3 C I t F b x d d 2 f D W L g x T H f a Q 3 5 P q 4 S D d d d A i H F 9 G G 9 6 2 3 A J v - W U b U p A h 8 _ 0 Y Z T 6 B H Q N 4 f A B 2 j O 3 _ Y 5 q H 7 w x e d A N v I n J 5 R r h l s i i h M Y q e - s u 1 U 1 O ? ? ? ? u s e r - a g e n t ? ? ? g M o z i l l a / 5 . 0 ( X 1 1 ; L i n u x x 8 6 _ 6 4 ) A p p l e W e b K i t / 5 3 7 . 1 0 ( K H T M L , l i k e G e c k o ) C h r o m e / 2 3 . 0 . 1 2 6 4 . 0 S a f a r i / 5 3 7 . 1 0 ? ? ? ? x - c h r o m e - v a r i a t i o n s ? ? ? 0 C M + 1 y Q E I l b b J A Q i d t s k B C K S 2 y Q E I p 7 b J A Q i 9 t s k B C L u D y g E = ? ? ? ? ? ? ? ? : h o s t ? ? ? ? w w w . g o o g l e . c o m ? ? ? ? : m e t h o d ? ? ? ? G E T ? ? ? ? : p a t h ? ? ? ? / c s i ? v = 3 ? s = w e b h p ? a c t i o n = ? s r t = 2 4 6 ? p = s ? n p n = 1 ? e = 1 7 2 5 9 , 2 3 6 2 8 , 2 3 6 7 0 , 3 2 6 9 0 , 3 5 7 0 4 , 3 7 1 0 2 , 3 8 0 3 4 , 3 8 4 4 9 , 3 8 4 6 6 , 3 9 1 5 4 , 3 9 3 3 2 , 3 9 5 2 3 , 3 9 9 7 8 , 4 0 1 9 5 , 4 0 3 3 3 , 3 3 0 0 0 4 7 , 3 3 0 0 1 1 7 , 3 3 0 0 1 2 5 , 3 3 0 0 1 3 2 , 3 3 0 0 1 3 5 , 3 3 0 0 1 5 7 , 3 3 1 0 0 1 1 , 4 0 0 0 1 1 6 , 4 0 0 0 2 6 0 , 4 0 0 0 2 6 7 , 4 0 0 0 2 7 8 , 4 0 0 0 3 0 8 , 4 0 0 0 3 5 2 , 4 0 0 0 3 5 4 , 4 0 0 0 4 7 2 , 4 0 0 0 4 7 6 , 4 0 0 0 5 1 6 , 4 0 0 0 5 1 9 , 4 0 0 0 5 5 3 , 4 0 0 0 5 9 3 , 4 0 0 0 6 0 5 , 4 0 0 0 6 1 6 , 4 0 0 0 7 6 2 , 4 0 0 0 8 2 5 , 4 0 0 0 8 3 7 , 4 0 0 0 8 4 1 , 4 0 0 0 8 4 9 ? e i = Y r N P U N P t H 8 T C g A f Y 7 I C A C g ? i m c = 2 ? i m n = 2 ? i m p = 2 ? r t =

That's a pretty busy diagram! But I don't think it's too bad with a bit of explanation:

zlib uses a language with basically two statements: “output these literal bytes” and “go back x bytes and duplicate y bytes from there”. In the diagram, red text was included literally and black text came from duplicating previous text.

The duplicated text is underlined. A dark blue underline means that the original text is in the diagram and there will be a gray line pointing to where it came from. (You can hover the mouse over one of those lines to make it darker.)

A light blue underline means that the original text came from a pre-shared dictionary of strings. SPDY defines some common text, for zlib to be able to refer to, that contains strings that we expect to find in the headers. This is most useful at the beginning of compression when there wouldn't otherwise be any text to refer back to.

The problem that CRIME highlights is that sensitive cookie data and an attacker controlled path is compressed together in the same context. Cookie data makes up most of the red, uncompressed bytes in the diagram. If the path contains some cookie data, then the compressed headers will be shorter because zlib will be able to refer back to the path, rather than have to output all the literal bytes of the cookie. If you arrange things so that you can probe the contents of the cookie incrementally, then (assuming that the cookie is base64), you can extract the cookie byte-by-byte by inducing the browser to make requests.

For details of how to get zlib to reveal that information in practice, I'll just refer you to Duong and Rizzo's CRIME presentation. It's good work.

In order to carry out this attack, the attacker needs to be able to observe your network traffic and to be able to cause many arbitrary requests to be sent. An active network attacker can do both by injecting Javascript into any HTTP page load that you make in the same session.

When we learned of this work, we were already in the process of designing the compression for SPDY/4, which avoids this problem. But we still needed to do something about SPDY/2 and SPDY/3 which are currently deployed. To that end, Chrome 21 and Firefox 15 have switched off SPDY header compression because that's a minimal change that easily backports.

Chrome has also switched off TLS compression, through which a very similar attack can be mounted.

But we like SPDY header compression because it saves significant amounts of data on the wire! Since SPDY/4 isn't ready to go yet we have a more complex solution for Chrome 22/23 that compresses data separately while still being backwards compatible.

Most importantly cookie data will only ever be duplicated exactly, and in its entirety, against other cookie data. Each cookie will also be placed in its own Huffman group (Huffman coding is a zlib detail that I skipped over in the explanation above). Finally, in case other headers contain sensitive data (i.e. when set by an XMLHttpRequest), non-standard headers will be compressed in their own Huffman group without any back references.

That's only a brief overview of the rules. The code to follow them and continue to produce a valid zlib stream wasn't one of the cleaner patches ever landed in Chrome and I'll be happy to revert it when SPDY/4 is ready. But it's effective at getting much of the benefit of compression back.

To the right are a couple of images of the same sort of diagram as above, but zoomed out. At this level of zoom, all you can really see are the blocks of red (literal) and blue (duplicated) bytes. The diagram on the right has the new rules enabled and, as you can see, there is certainly more red in there. However that's mostly the result of limited window size. In order to save on server memory, Chrome only uses 2048-byte compression windows and, under the new rules, a previous cookie value has to fit completely within the window in order to be matched. So things are a little less efficient until SPDY/4, although we might choose to trade a little more memory to make up for that.

SSL interstitial bypass rates

In yesterday's post I threw in the following that I've been asked to clarify:

“we know that those bypass buttons are clicked 60% of the time by Chrome users”

Chrome collects anonymous statistics from users who opt in to such collection (and thank you to those who do!). One of those statistics covers how frequently people bypass SSL interstitials. (As always, the Chrome privacy policy has the details.)

We define bypassing the interstitial as clicking the ‘Proceed anyway’ button and not bypassing as either closing the tab, navigating elsewhere, or clicking the ‘Back’ button.

I picked five days at random over the past six weeks and averaged the percentages of the time that users bypassed rather than not. That came to 61.6%.

There may be some biases here: we may have a biased population because we only include users who have opted in to statistics collection. We are also counting all interstitals: there may be a small number of users who bypass a lot of SSL errors. But that's the data that we have.

Living with HTTPS

(These are my notes from the first half of my talk at HOPE9 last weekend. I write notes like these not as a script, but so that I have at least some words ready in my head when I'm speaking. They are more conversational and less organised than a usual blog post, so please forgive me the rough edges.)

HTTPS tends to cause people to give talks mocking certificate security and the ecosystem around it. Perhaps that's well deserved, but that's not what this talk is about. If you want to have fun at the expense of CAs, dig up one of Moxie's talks. This talk deals with the fact that your HTTPS site, and the sites that you use, probably don't even reach the level where you get to start worrying about certificates.

I'm a transport security person so the model for this talk is that we have two computers talking over a malicious network. We assume that the computers themselves are honest and uncompromised. That might be a stretch in these malware-ridden times, but that's the area of host security and I'm not talking about that today. The network can drop, alter or fabricate packets at will. As a lemma, we also assume that the network can cause the browser to load any URL it wishes. The network can do this by inserting HTML into any HTTP request and we assume that every user makes some unencrypted requests while browsing.


If the average user typed mail.google.com into a browser and saw the following, what fraction of them do you think would login, none the wiser?

Can you even see what's terribly wrong here?

The problem is that the page isn't served over HTTPS. It should have been, but when a user types a hostname into a browser, the default scheme is HTTP. The server may attempt to redirect users to HTTPS, but that redirect is insecure: a MITM attacker can rewrite it and keep the user on HTTP, spoofing the real site the whole time. The attacker can now intercept all the traffic to this perfectly well configured and secure website.

This is called SSL stripping and it's terribly simple and devastatingly effective. We probably don't see it very often because it's not something that corporate proxies need to do, so it's not in off-the-shelf devices. But that respite is unlikely to last very long and maybe it's already over: how would we even know if it was being used?

In order to stop SSL stripping, we need to make HTTPS the only protocol. We can't do that for the whole Internet, but we can do it site-by-site with HTTP Strict Transport Security (HSTS).

HSTS tells browsers to always make requests over HTTPS to HSTS sites. Sites become HSTS either by being built into the browser, or by advertising a header:

Strict-Transport-Security: max-age=8640000; includeSubDomains

The header is in force for the given number of seconds and may also apply to all subdomains. The header must be received over a clean HTTPS connection.

Once the browser knows that a site is HTTPS only, the user typing mail.google.com is safe: the initial request uses HTTPS and there's no hole for an attacker to exploit.

(mail.google.com and a number of other sites are already built into Chrome as HSTS sites so it's not actually possible to access accounts.google.com over HTTP with Chrome - I had to doctor that image! If you want to be included in Chrome's built-in HSTS list, email me.)

HSTS can also protect you, the webmaster, from making silly mistakes. Let's assume that you've told your mother that she should always type https:// before going to her banking site or maybe you setup a bookmark for her. That's honestly more than we can, or should, expect of our users. But let's say that our supererogatory user enters https://www.citibank.com in order to securely connect to her bank. What happens? Well, https://www.citibank.com redirects her to http://www.citibank.com. They've downgraded the user! From there, the HTTP site should redirect back to HTTPS, but the damage has been done. An attacker can get in through the hole.

I'm honestly not picking on Citibank here. They were simply the second site that I tried and I was some surprised that the first site didn't have the problem. It's a very easy mistake to make, and everything just works! It's a completely silent disaster! But HSTS would have either prevented it, or would have failed closed.

HSTS also does something else. It turns this:

Into this:

The “bypass this certificate error” button has gone. That button is a UI disaster. Asking regular people to evaluate the validity of X.509 certificates is insane. It's a security cop-out that we're saddled with, and which is causing real damage.

We've seen widespread MITM attacks using invalid certificates in Syria and, in recent weeks, Jordan. These attacks are invalid! This is squarely within our threat model for transport security and it shouldn't have been a risk for anybody. But we know that those bypass buttons are clicked 60% of the time by Chrome users. People are certainly habituated to clicking them and I bet that a large number of people were victims of attacks that we should have been able to prevent.

If you take only one thing away from this talk, HSTS should be it.

Mixed scripting

One we're sorted HSTS we have another problem. These snippets of HTML are gaping wide holes in your security:

<script src="http://...

<link href="http://...

<embed src="http://...

It's called mixed scripting and it happens when a secure site loads critical sub-resources over HTTP. It's a subset of mixed content: mixed content covers loading any sub-resource insecurely. Mixed content is bad, but when the resource is Javascript, CSS, or a plugin we give is another name to make it clear that its a lot more damaging.

When you load sub-resources over HTTP, an attacker can replace them with content of their choosing. The attacker also gets to choose any page on your HTTPS site with the problem. That includes pages that you don't expect to be served over HTTPS, but happen to be mapped. If you have this problem anywhere, on any HTTPS page, the attacker wins.

With complex sites, it's very difficult to ensure that this doesn't happen. One good way to limit it is to only serve over HTTPS so that there aren't any pages that you expect to serve over HTTP. Also, HSTS might also save you if you're loading mixed script from the same domain.

Another mitigation is to use scheme-relative URLs everywhere possible. These URLs look like //example.com/bar.js and are valid in all browsers. They inherit the scheme of the parent page, which will be HTTP or HTTPS as needed. (Although it does mean that if you load the page from disk then things will break. The scheme will be file:// in that case.)

Fundamentally this is such an easy mistake to make, and such a problem, that the only long term solution is for browsers to stop loading insecure Javascript, CSS and plugins for HTTPS sites. To their credit, IE9 does this and did it before Chrome. But I'm glad to say that Chrome has caught up and mixed scripting is now blocked by default, although with a user-override:

Yes, there's another of those damm bypass buttons. But we swear that it's just a transitional strategy and it already stops silent exploitation in a hidden iframe.


HTTP and HTTPS cookie jars are the same. No really: cookies aren't scoped to a protocol! That means that if you set a Cookie on https://example.com and then make a request to http://example.com, the cookies will be sent in the clear! In order to prevent this, you should be setting secure on your HTTPS cookies. Sadly this is a very easy thing to miss because everything will still work if you omit it, but without the secure tag, attackers can easily steal your cookies.

It's worth noting that HSTS can protect you from this: by preventing the HTTP request from being sent, the cookies can't be leaked that way, but you'll need to include all subdomains in the HSTS coverage.

There's a second corollary to this: attackers can set your HTTPS cookies to. By causing an request to be sent to http://example.com, they can spoof a reply with cookies that then override any existing cookies. In this fashion, an attacker can log a user in as themselves during their interaction with an HTTPS site. Then, say, emails that they send will be saved in the attacker's out-box.

There's no very good protection against this except HSTS again. By preventing any HTTP requests you can stop the attacker from spoofing a reply to set the cookies. Against, HSTS needs to cover all subdomains in order to be effective against this.

Get yourself checked out

You should go to https://www.ssllabs.com and run their scan against your site. It's very good, but ignore it if it complains about the BEAST attack. It'll do so if you make a CBC ciphersuite your top preference. Browsers have worked around BEAST and non-browser clients are very unlikely to provide the attacker enough access to be able to pull it off. You have a limited amount of resources to address HTTPS issues and I don't think BEAST should make the list.

Get a real certificate

You should get a real certificate. You probably already have one but, if you don't, then you're just training more people to ignore certificate errors and you can't have HSTS without a real certificate. StartSSL give them away for free. Get one.

If you've reached this far and have done all of the above, congratulations: you're in the top 0.1% of HTTPS sites. If you're getting bored, this is a reasonable point to stop reading: everything else is just bonus points from now on.

Forward secrecy

You should consider forward secrecy. Forward secrecy means that the keys for a connection aren't stored on disk. You might have limited the amount of information that you log in order to protect the privacy of your users, but if you don't have forward secrecy then your private key is capable of decrypting all past connections. Someone else might be doing the logging for you.

In order to enable forward secrecy you need to have DHE or ECDHE ciphersuites as your top preference. DHE ciphersuites are somewhat expensive if you're terminating lots of SSL connections and you should be aware that your server will probably only allow 1024-bit DHE. I think 1024-bit DHE-RSA is preferable to 2048-bit RSA, but opinions vary. If you're using ECDHE, use P-256.

You also need to be aware of Session Tickets in order to implement forward secrecy correctly. There are two ways to resume a TLS connection: either the server chooses a random number and both sides store the session information, of the server can encrypt the session information with a secret, local key and send that to the client. The former is called Session IDs and the latter is called Session Tickets.

But Session Tickets are transmitted over the wire and so the server's Session Ticket encryption key is capable of decrypting past connections. Most servers will generate a random Session Ticket key at startup unless otherwise configured, but you should check.

I'm not going to take the time to detail how to configure this here. There are lots of webservers and it would take a while. This is more of a pointer so that you can go away and research it if you wish.

(The rest of the talk touched on OpenSSL speed, public key pinning, TLS 1.1 and 1.2 and a few other matters, but I did those bits mostly on the fly and don't have notes for them.)

Decrypting SSL packet dumps

We all love transport security but it can get in the way of a good tcpdump. Unencrypted protocols like HTTP, DNS etc can be picked apart for debugging but anything running over SSL can be impenetrable. Of course, that's an advantage too: the end-to-end principle is dead for any common, unencrypted protocol. But we want to have our cake and eat it.

Wireshark (a common tool for dissecting packet dumps) has long had the ability to decrypt some SSL connections given the private key of the server, but the private key isn't always something that you can get hold of, or want to spread around. MITM proxies (like Fiddler) can sit in the middle of a connection and produce plaintext, but they also alter the connection: SPDY, client-certificates etc won't work through them (at least not without special support).

So here's another option: if you get a dev channel release of Chrome and a trunk build of Wireshark you can run Chrome with the environment variable SSLKEYLOGFILE set to, say, /home/foo/keylog. Then, in Wireshark's preferences for SSL, you can tell it about that key log file. As Chrome makes SSL connections, it'll dump an identifier and the connection key to that file and Wireshark can read those and decrypt SSL connections.

The format of the key log file is described here. There's an older format just for RSA ciphersuites that I added when Wireshark decrypted purely based on RSA pre-master secrets. However, that doesn't work with ECDHE ciphersuites (amongst others) so the newer format can be used to decrypt any connection. (You need the trunk build of Wireshark to support the newer format.) Chrome currently writes records of both formats.

This can also be coupled with spdy-shark to dissect SPDY connections.

Since key log support is part of NSS, support will hopefully end up in Firefox in the future.

New TLS versions

TLS is the protocol behind most secure connections on the Internet and most TLS is TLS 1.0, despite that fact that the RFC for 1.0 was published in January 1999, over 13 years ago.

Since then there have a two newer versions of TLS: 1.1 (2006) and 1.2 (2008). TLS 1.1 added an explicit IV for CBC mode ciphers as a response to CBC weaknesses that eventually turned into the BEAST attack. TLS 1.2 changes the previous MD5/SHA1 combination hash to use SHA256 and introduces AEAD ciphers like AES-GCM.

However, neither of these versions saw any significant adoption for a long time because TLS's extension mechanism allowed 1.0 to adapt to new needs.

But things are starting to change:

In the long run, getting to 1.2 is worthwhile. The MD5/SHA1 hash combination used previous versions was hoped to be more secure than either hash function alone, but [1] suggests that it's probably only as secure as SHA1. Also, the GCM cipher modes allow AES to be used without the problems (and space overhead) of CBC mode. GCM is hardware accelerated in recent Intel and AMD chips along with AES itself.

But there are always realities to contend with I'm afraid:

Firstly, there's the usual problem of buggy servers. TLS has a version negotiation mechanism, but some servers will fail if a client indicates that it supports the newer TLS versions. (Last year, Yngve Pettersen suggested that 2% of HTTPS servers failed if the client indicated TLS 1.1 and 3% for TLS 1.2.)

Because of this Chrome implements a fallback from TLS 1.1 to TLS 1.0 if the server sends a TLS error. (And we have a fallback from TLS 1.0 to SSL 3.0 if we get another TLS error on the second try.) This, sadly, means that supporting TLS 1.1 cannot bring any security benefits because an attacker can cause us to fallback. Thankfully, the major security benefit of TLS 1.1, the explicit CBC IVs, was retrofitted to previous versions in the form of 1/n-1 record splitting after the BEAST demonstration.

Since these fallbacks can be a security concern (especially the fallback to SSLv3, which eliminates ECDHE forward secrecy) I fear that it's necessary to add a second, redundant version negotiation mechanism to the protocol. It's an idea which has been floated before and I raised it again recently.

But buggy servers are something that we've known about for many years. Deploying new TLS versions has introduced a new problem: buggy networks.

Appallingly it appears that there are several types of network device that manage to break when confronted with new TLS versions. There are some that break any attempt to offer TLS 1.1 or 1.2, and some that break any connections that negotiate these versions. These failures, so far, manifest in the form of TCP resets, which isn't currently a trigger for Chrome to fallback. Although we may be forced to add it.

Chrome dev or iOS users suffering from the first type of device see all of their HTTPS connections fail. Users suffering the second type only see failures when connecting to sites that support TLS 1.1 or 1.2. (Which includes Google.). iOS leaves it up to the application to implement fallback if they wish and adding TLS 1.2 support to Google's servers has caused some problems because of these bad networks.

We're working to track down the vendors with issues at the moment and to make sure that updates are available, and that they inform their customers of this. I'm very interested in any cases where Chrome 21 suddenly caused all or some HTTPS connections to fail with ERR_CONNECTION_RESET. If you hit this, please let me know (agl at chromium dot org).

([1] Antoine Joux, Multicollisions in Iterated Hash Functions: Application to Cascaded Constructions, CRYPTO (Matthew K. Franklin, ed.), Lecture Notes in Computer Science, vol. 3152, Springer, 2004, pp. 306–316.)

False Start's Failure

Eighteen months ago(ish), Chrome started using False Start. False Start reduces the average time for an SSL handshake by 30%.

Since the biggest problem with transport security is that most sites don't use it, anything that reduces the latency impact of HTTPS is important. Making things faster doesn't just make them faster, it also makes them cheaper and more prevalent. When HTTPS is faster, it'll be used in more places than it would otherwise be.

But, sadly, False Start will be disabled, except for sites doing NPN, in Chrome 20. NPN is a TLS extension that we use to negotiate SPDY, although you don't have to use it to negotiate SPDY, you can advertise http/1.1 if you wish.

False Start was known to cause problems with a very small number of servers and the initial announcement outlined the uncommon scheme that we used to deploy it: we scanned the public Internet and built up a list of problematic sites. That list was built into Chrome and we didn't use False Start for connections to those sites. Over time the list was randomly eroded away and I'd try to address any issues that came up. (Preemptively so in the case of large sites.)

It did work to some extent. Many sites that had problems were fixed and it's a deployment scheme that is worth considering in the future. But it didn't ultimately work well enough for False Start.

Initially we believed that False Start issues were deterministic so long as the TLS Finished and application data records were sent in the same TCP packet. We changed Chrome to do this in the hopes of making False Start issues deterministic. However, we later discovered some HTTPS servers that were still non-deterministically False Start intolerant. I hypothesise that the servers run two threads per connection: one for reading and one for writing. Although the TCP packet was received atomically, thread scheduling could mean that the read thread may or may not be scheduled before the write thread had updated the connection state in response to the Finished.

This non-determinism made False Start intolerance difficult to diagnose and reduced our confidence in the blacklist.

The `servers' with problems were nearly always SSL terminators. These hardware devices terminate SSL connections and proxy unencrypted data to backend HTTP servers. I believe that False Start intolerance is very simple to fix in the code and one vendor suggested that was the case. None the less, of the vendors who did issue an update, most failed to communicate that fact to their customers. (A pattern that has repeated with the BEAST fix.)

One, fairly major, SSL terminator vendor refused to update to fix their False Start intolerance despite problems that their customers were having. I don't believe that this was done in bad faith, but rather a case of something much more mundane along the lines of “the SSL guy left and nobody touches that code any more”. However, it did mean that there was no good answer for their customers who were experiencing problems.

Lastly, it was becoming increasingly clear that we had a bigger problem internationally. Foreign admins have problems finding information on the subject (which is mostly in English) and foreign users have problems reporting bugs because we can't read them. We do have excellent agents in countries who liaise locally but it was still a big issue, and we don't cover every country with them. I also suspect that the distribution of problematic SSL terminators is substantially larger in some countries and that the experience with the US and Europe caused us to underestimate the problem.

In aggregate this lead us to decide that False Start was causing more problems than it was worth. We will now limit it to sites that support the NPN extension. This unfortunately means that it'll be an arcane, unused optimisation for the most part: at least until SPDY takes over the world.

Very large RSA public exponents

After yesterday's post that advocated using RSA public exponents of 3 or 216+1 in DNSSEC for performance, Dan Kaminsky asked me whether there was a potential DoS vector by using really big public exponents.

Recall that the RSA signature verification core is me mod n. By making e and n larger, we can make the operation slower. But there are limits, at least in OpenSSL:

/* for large moduli, enforce exponent limit */
if (BN_num_bits(rsa->n) > OPENSSL_RSA_SMALL_MODULUS_BITS) {
        if (BN_num_bits(rsa->e) > OPENSSL_RSA_MAX_PUBEXP_BITS) {
                return -1;

So, if n is large, we enforce a limit on e. The values of the #defines are such that for n>3072 bits, e must be less than or equal to 64 bits. So the slowest operations happen with an n and e of 3072 bits. (The fact that e<n is enforced earlier in the code.)

So I setup *.bige.imperialviolet.org. This is a perfectly valid and well signed zone which happens to use a 3072-bit key with a 3072-bit public exponent. (I could probably have slowed things down more by picking a public exponent with lots of 1s in its binary representation, but it's just a random number in this case.) One can resolve records 1.bige.imperialviolet.org, 2.bige.imperialviolet.org, … and the server doesn't have to sign anything because it's a wildcard: a single signature covers all of the names. However, the resolver validates the signature every time.

On my 2.66GHz, Core 2 laptop, 15 requests per second causes unbound to take 95% of a core. A couple hundred queries per second would probably put most DNSSEC resolvers in serious trouble.

So I'd recommend limiting the public exponent size in DNSSEC to 216+1, except that people are already mistakenly using 232+1, so I guess that needs to be the limit. The DNSSEC spec limits the modulus size to 4096-bits, and 4096-bit signatures are about 13 times slower to verify than the typical 1024-bit signatures used in DNSSEC. But that's a lot less of a DoS vector than bige.imperialviolet.org, which is 2230 times slower than normal.

RSA public exponent size

RSA public operations are much faster than private operations. Thirty-two times faster for a 2048-bit key on my machine. But the two operations are basically the same: take the message, raise it to the power of the public or private exponent and reduce modulo the key's modulus.

What makes the public operations so much faster is that the public exponent is typically tiny, while the private exponent should be the same size as the modulus. One can actually use a public exponent of three in RSA, and clearly cubing a number should be faster than raising it to the power of a 2048-bit number.

In 2006, Daniel Bleichenbacher (a fellow Googler) gave a talk at the CRYPTO 2006 rump session where he outlined a bug in several, very common RSA signature verification implementations at the time. The bug wasn't nearly as bad as it could have been because it only affected public keys that used a public exponent of three and most keys used a larger exponent: 216+1. But the fact that a slightly larger public exponent saved these buggy verifiers cemented 216+1 as sensible default value for the public exponent.

But there's absolutely no reason to think that a public exponent larger than 216+1 does any good. I even checked that with Bleichenbacher before writing this. Three should be fine, 216+1 saved some buggy software a couple of times and any larger is probably a mistake.

Because of that, when writing the Go RSA package, I didn't bother supporting public exponents larger than 231-1. But then several people reported that they couldn't verify some DNSSEC signatures. Sure enough, the DNSKEY records for .cz and .us are using a public exponent of 232+1.

DNSSEC is absolutely the last place to use large, public exponents. Busy DNS resolvers have to resolve tens or hundreds of thousands of records a second; fast RSA signature verification is big deal in DNSSEC. So I measured the speed impact of various public exponent sizes with OpenSSL (1.0.1-beta3):

Public exponentVerification time (µs)

So a public exponent of 232+1 makes signature verification over four times slower than an exponent of three. As DNSSEC grows, and DNSSEC resolvers too, that extra CPU time is going to be a big deal.

It looks like the zones using a value of 232+1 are just passing the -e flag to BIND's dnssec-keygen utility. There's some suggestion that -e used to select 216+1 but silently changed semantics in some release.

So today's lesson is don't pass -e to dnssec-keygen! The default of dnssec-keygen is 216+1 and that's certainly safe. The .com zone uses a value of three and I think that's probably the best choice given the CPU cost and the fact that the original Bleichenbacher bug has been long since fixed.

Forward secrecy for IE and Safari too

When we announced forward secrecy for Google HTTPS we noted that “we hope to support IE in the future”. It wasn't that we didn't want to support IE, but IE doesn't implement the combination of ECDHE with RC4. IE (or rather, SChannel) does implement ECDHE with AES on Vista and Windows 7, but I only wanted to make one change at a time.

With the release of MS12-006 a month ago to address the BEAST weakness in TLS 1.0's CBC design, we've now made ECDHE-RSA-AES128-SHA our second preference cipher. That means that Chrome and Firefox will still use ECDHE-RSA-RC4-SHA, but IE on Vista and Windows 7 will get ECDHE now. This change also means that we support ECDHE with Safari (at least on Lion, where I tried it.)

RSA 2012

Just a brief note that I'll be at RSA 2012 in San Francisco at the end of the month and will be speaking with several others about certificate revocation. (tech-106, Tuesday, 1pm.).

If you want to chat and don't manage to grab me then, drop me an email. (agl at imperialviolet.org if you didn't already know.)

Revocation checking and Chrome's CRL

When a browser connects to an HTTPS site it receives signed certificates which allow it to verify that it's really connecting to the domain that it should be connecting to. In those certificates are pointers to services, run by the Certificate Authorities (CAs) that issued the certificate, that allow the browser to get up-to-date information.

All the major desktop browsers will contact those services to inquire whether the certificate has been revoked. There are two protocols/formats involved: OCSP and CRL, although the differences aren't relevant here. I mention them only so that readers can recognise the terms in other discussions.

The problem with these checks, that we call online revocation checks, is that the browser can't be sure that it can reach the CA's servers. There are lots of cases where it's not possible: captive portals are one. A captive portal frequently requires you to sign in on an HTTPS site, but blocks traffic to all other sites, including the CA's OCSP servers.

If browsers were to insist on talking to the CA before accepting a certificate, all these cases would stop working. There's also the concern that the CA may experience downtime and it's bad engineering practice to build in single points of failure.

Therefore online revocation checks which result in a network error are effectively ignored (this is called “soft-fail”). I've previously documented the resulting behaviour of several browsers.

But an attacker who can intercept HTTPS connections can also make online revocation checks appear to fail and so bypass the revocation checks! In cases where the attacker can only intercept a subset of a victim's traffic (i.e. the SSL traffic but not the revocation checks), the attacker is likely to be a backbone provider capable of DNS or BGP poisoning to block the revocation checks too.

If the attacker is close to the server then online revocation checks can be effective, but an attacker close to the server can get certificates issued from many CAs and deploy different certificates as needed. In short, even revocation checks don't stop this from being a real mess.

So soft-fail revocation checks are like a seat-belt that snaps when you crash. Even though it works 99% of the time, it's worthless because it only works when you don't need it.

While the benefits of online revocation checking are hard to find, the costs are clear: online revocation checks are slow and compromise privacy. The median time for a successful OCSP check is ~300ms and the mean is nearly a second. This delays page loading and discourages sites from using HTTPS. They are also a privacy concern because the CA learns the IP address of users and which sites they're visiting.

On this basis, we're currently planning on disabling online revocation checks in a future version of Chrome. (There is a class of higher-security certificate, called an EV certificate, where we haven't made a decision about what to do yet.)

Pushing a revocation list

Our current method of revoking certificates in response to major incidents is to push a software update. Microsoft, Opera and Firefox also push software updates for serious incidents rather than rely on online revocation checks. But our software updates require that users restart their browser before they take effect, so we would like a lighter weight method of revoking certificates.

So Chrome will start to reuse its existing update mechanism to maintain a list of revoked certificates, as first proposed to the CA/Browser Forum by Chris Bailey and Kirk Hall of AffirmTrust last April. This list can take effect without having to restart the browser.

An attacker can still block updates, but they have to be able to maintain the block constantly, from the time of revocation, to prevent the update. This is much harder than blocking an online revocation check, where the attacker only has to block the checks during the attack.

Since we're pushing a list of revoked certificates anyway, we would like to invite CAs to contribute their revoked certificates (CRLs) to the list. We have to be mindful of size, but the vast majority of revocations happen for purely administrative reasons and can be excluded. So, if we can get the details of the more important revocations, we can improve user security. Our criteria for including revocations are:

  1. The CRL must be crawlable: we must be able to fetch it over HTTP and robots.txt must not exclude GoogleBot.
  2. The CRL must be valid by RFC 5280 and none of the serial numbers may be negative.
  3. CRLs that cover EV certificates are taken in preference, while still considering point (4).
  4. CRLs that include revocation reasons can be filtered to take less space and are preferred.

For the curious, there is a tool for fetching and parsing Chrome's list of revoked certificates at https://github.com/agl/crlset-tools.

Extracting Mozilla's Root Certificates

When people need a list of root certificates, they often turn to Mozilla's. However, Mozilla doesn't produce a nice list of PEM encoded certificates. Rather, they keep them in a form which is convenient for NSS to build from: https://mxr.mozilla.org/mozilla/source/security/nss/lib/ckfw/builtins/certdata.txt?raw=1.

Several people have written quick scripts to try and convert this into PEM format, but they often miss something critical: some certificates are explicitly distrusted. These include the DigiNotar certificates and the misissued COMODO certificates. If you don't parse the trust records from the NSS data file, then you end up trusting these too! There's at least one, major example of this that I know of.

(Even with a correct root file, unless you do hard fail revocation checking you're still vulnerable to the misissued COMODO certificates.)

So, at the prodding of Denton Gentry, I've open-sourced a tool for converting NSS's file to PEM format: extract-nss-root-certs. (At the time of writing it requires a 6g built from the weekly or current tree (hg -r weekly), not the release tree. A few of the APIs have changed since the last Go release was done. This will be resolved when Go 1.0 is released.)

BEAST followup

(See the original post for background.)

Everyone seems to have settled on 1/n-1 record splitting as a workaround for the BEAST attack in TLS 1.0 and SSLv3. Briefly: 1/n-1 record splitting breaks CBC encrypted records in two: the first with only a single byte of application data and the second with the rest. This effectively randomises the IV and stops the attack.

The workaround which OpenSSL tried many years ago, and which hit significant issues, was 0/n record splitting. It's the same thing, but with the first record being empty. The problem with it was that many stacks processed the empty record and returned a 0-byte read, which higher levels took to mean EOF.

1/n-1 record splitting doesn't hit that problem, but it turns out that there's a fair amount of code out there that assumes that the entire HTTP request comes in a single read. The single byte record breaks that.

We first implemented 1/n-1 record splitting in Chrome 15 but backed off after only a couple of days because logging into several large sites broke. But that did motivate the sites to fix things so that we could switch it on in Chrome 16 and it stuck that time.

Opera also implemented it around this time, but I think Chrome took the brunt of the bug reports and it's time consuming dealing with them. Myself and a colleague have been emailing and phoning a lot of sites and vendors while dealing with upset users and site admins. Chrome certainly paid a price for moving before Firefox and IE but then we're nice like that.

Thankfully, this week, Microsoft released a security update which implements 1/n-1 record splitting in SChannel and switches it on in IE. (Although it defaults to off for other users of SChannel, unlike NSS.) Now the sites which broke with Chrome 16 are also broken in a patched IE and that takes some pressure off us. In a few weeks, Firefox 10 should be released and then we'll be about as good as we're going to get.

After taking the brunt with Chrome 16, there is one case that I'm not going to fight: Plesk can't handle POST payloads that don't come in a single read. Chrome (currently) sends POSTs as two writes: one for the HTTP headers and a second for the POST body. That means that each write is split into two records and Plesk breaks because of the second split. IE and Firefox send the headers and body in a single write, so there's only a single split in the HTTP headers, which Plesk handles.

Chrome will start merging small POST bodies into the headers with Chrome 17 (hopefully) and this will fix Plesk. Also, merging as Firefox and IE do saves an extra packet so it's worthwhile on its own. Once again, anything that's mostly true soon becomes an unwritten rule on the Internet.

It's worth contrasting the BEAST response to the renegotiation attack. The BEAST workaround caused a number of problems, but it worked fine for the vast majority of sites. The renegotiation fix requires that very nearly every HTTPS site on the Internet be updated and then that browsers refuse to talk to unpatched servers.

I'd bet that we'll not manage to get enough patched servers for any browser to require it this side of 2020. Unpatched servers can still disable renegotiation to protect themselves, but it's still not hard to find major sites that allow insecure renegotiation (www.chase.com was literally the second site that I tried).

OTR in Go

“Off the record” is, unfortunately, an overloaded term. To many it's feature in gTalk and AOL IM which indicates that the conversation isn't logged. However, to crypto folks it's a protocol for secure chat.

(In fact, resoloving the ambiguity is on the EFF's wish list.)

Pidgin has been my chat client of choice for some time because it's pretty fully featured and supports OTR via a plugin. However, I just don't trust it from a security point of view. The latest incident was only a couple of weeks ago: CVE-2011-3919.

So, I implemented otr in Go, as well as an XMPP library and client. It's an absolutely minimal client (except for OTR support) and implements only what I absolutely need in a client.

But it does mean that the whole stack, including the TLS library, is implemented in a memory safe language. (On the other hand, pretty much everything in that stack, from the modexp function to the terminal handling code was written by me and has never really been audited. I'm a decent programmer but I'm sure there are some howlers of security issues in there somewhere.)

Certificate Transparency

(I don't have comments on this blog, but you can comment on my Google+ post.)

Ben Laurie and I have been working on a longer term plan for improving the foundations of the certificate infrastructure on which most Internet transport security is based on these days. Although Chrome has public key pinning for some domains, which limits the set of permitted certificates, we don't see public key pinning as a long term solution (and nor was it ever designed to be).

For the 10 second summary of the plan, I'll quote Ben: “certificates are registered in a public audit log. Servers present proofs that their certificate is registered, along with the certificate itself. Clients check these proofs and domain owners monitor the logs.”. I would add only that anyone can check the logs: the certificate infrastructure would be fully transparent.

We now have an outline of the basic idea and will be continuing to flesh it out in the coming months, hopefully in conjunction with other browser vendors.

But I thought that, at the outset, it would be helpful to describe some of the limitations to the design space, as I see them:

No side-channels

As I've previously described, side-channels occur when a browser needs to contact a server other than the immediate destination in order to verify a certificate. Revocation checking with OCSP is an example of a side-channel used today.

But in order to be effective, side-channels invariably need to block the certificate verification until they complete, and that's a big problem. The Internet isn't fully connected. Captive portals, proxies and firewalls all mean that the only thing you can really depend on talking to is the immediate destination server. Because of this, browsers have never been able to make OCSP lookups blocking, and therefore OCSP is basically useless for security.

And that's not to mention the privacy, performance and functionality issues that arise from needing side-channels. (What happens when the side-channel server goes down?)

So our design requires that the servers send us the information that we require. We can use side-channels to check up on the logs, but it's an asynchronous lookup.

It's not opt-in, it's all-in

SSL Stripping is a very easy and very effective attack. HSTS prevents it and is as easy to deploy as anything can be for HTTPS sites. But, despite all this, and despite a significant amount of evangelism, take up has been very limited, even by sites which are HTTPS only and the subject of attacks.

While HSTS really has to be opt-in, a solution to the certificate problem doesn't. Although our scheme is incrementally deployable, the eventual aim is that it's required for everybody. Thankfully, since certificates have to be renewed there's an obvious means to incrementally deploy it: require it for certificates issued after a certain date. Although an eventual hard requirement is still needed, it's a lot less of a problem.

It's easy on the server operator

Since the aim is to make it a requirement for all servers, we've sacrificed a lot in order to make it very easy on the server operator. For most server operators, their CA will probably take care of fetching the audit proofs, meaning there's no additional work at all.

Some initial designs included short-lived log signatures, which would have solved the revocation problem. (Revocation would simply be a matter of instructing the logs to stop signing a given certificate.) However, this would have required server operators to update their audit proofs on a near-daily basis. After discussions it became clear that such a requirement wouldn't be tenable for many and so we reluctantly dropped it.

We are also sacrificing decentralisation to make things easy on the server. As I've previously argued, decentralisation isn't all it's cracked up to be in most cases because 99.99% of people will never change any default settings, so we haven't given up much. Our design does imply a central set of trusted logs which is universally agreed. This saves the server from possibly having to fetch additional audit proofs at runtime, something which requires server code changes and possible network changes.

There are more valid certificates than the one that's currently serving

Cheap virtual hosting and EC2 have made multi-homed services common. Even small to medium scale sites have multiple servers these days. So any scheme that asserts that the currently serving certificate is the only valid certificate will run into problems when certificates change. Unless all the servers are updated at exactly the same time, then users will see errors during the switch. These schemes also make testing a certificate with a small number of users impossible.

In the end, the only real authority on whether a certificate is valid is the site itself. So we don't rely on external observations to decide on whether a certificate is valid, instead to seek to make the set of valid certificates for a site public knowledge (which it currently isn't), so that the site can determine whether it's correct.

It's not easy to do

We believe that this design will have a significant, positive impact on an important part of Internet security and that it's deployable. We also believe that any design that shares those two properties ends up looking a lot like it. (It's no coincidence that we share several ideas with the EFF's Sovereign Keys.)

None the less, deployment won't be easy and, hopefully, we won't be pushing it alone.

Forward secrecy for Google HTTPS

As announced on the Google Security Blog, Google HTTPS sites now support forward secrecy. What this means in practice is two things:

Firstly, the preferred cipher suite for most Google HTTPS servers is ECDHE-RSA-RC4-SHA. If you have a client that supports it, you'll be using that ciphersuite. Chrome and Firefox, at least, support it.

Previously we were using RSA-RC4-SHA, which means that the client (i.e. browser) picks a random key for the session, encrypts it with the server's public key and sends it to the server. Since only the holder of the private key can decrypt the session key, the connection is secure.

However, if an attacker obtains the private key in the future then they can decrypt recorded traffic. The encrypted session key can be decrypted just as well in ten years time as it can be decrypted today and, in ten years time, the attacker has much more computing power to break the server's public key. If an attacker obtains the private key, they can decrypt everything encrypted to it, which could be many months of traffic.

ECDHE-RSA-RC4-SHA means elliptic curve, ephemeral Diffie-Hellman, signed by an RSA key. You can see a previous post about elliptic curves for an introduction, but the use of elliptic curves is an implementation detail.

Ephemeral Diffie-Hellman means that the server generates a new Diffie-Hellman public key for each session and signs the public key. The client also generates a public key and, thanks to the magic of Diffie-Hellman they both generate a mutual key that no eavesdropper can know.

The important part here is that there's a different public key for each session. If the attacker breaks a single public key then they can decrypt only a single session. Also, the elliptic curve that we're using (P-256) is estimated to be as strong as a 3248-bit RSA key (by ECRYPT II), so it's unlikely that the attacker will ever be able to break a single instance of it without a large, quantum computer.

While working on this, Bodo Möller, Emilia Kasper and I wrote fast, constant-time implementations of P-224, P-256 and P-521 for OpenSSL. This work has been open-sourced and submitted upstream to OpenSSL. We also fixed several bugs in OpenSSL's ECDHE handling during deployment and those bug fixes are in OpenSSL 1.0.0e.

Session Tickets

The second part of forward secrecy is dealing with TLS session tickets.

Session tickets allow a client to resume a previous session without requiring that the server maintain any state. When a new session is established the server encrypts the state of the session and sends it back to the client, in a session ticket. Later, the client can echo that encrypted session ticket back to the server to resume the session.

Since the session ticket contains the state of the session, and thus keys that can decrypt the session, it too must be protected by ephemeral keys. But, in order for session resumption to be effective, the keys protecting the session ticket have to be kept around for a certain amount of time: the idea of session resumption is that you can resume the session in the future, and you can't do that if the server can't decrypt the ticket!

So the ephemeral, session ticket keys have to be distributed to all the frontend machines, without being written to any kind of persistent storage, and frequently rotated.


We believe that forward secrecy provides a fairly significant benefit to our users and we've contributed our work back to OpenSSL in the hope that others will make use of it.

Classifying solutions to the certificate problem

This is something that I wrote up internally, but which Chris Palmer suggested would be useful to post publicly for reference. It presents some, somewhat artificial, axis on which I believe that all proposed solutions to addressing weaknesses in the current certificate ecosystem fall. By dividing the problem into those decisions, I hope to make the better solutions clearer.

Axis 1: Private vs public signing

At the moment we have private signing. A CA can sign a certificate and tell nobody about it and the certificate is still valid. This is the reason that we have to go crawl the Internet in order to figure out how many intermediate CA certs there are.

In public signing, certificates aren't valid unless they're public.

There are degrees of how public schemes are. Convergence is a public scheme: certificates have to be visible to the notaries, but it's a lot less public than a scheme where all certificates have to be published. And published where? Highly public schemes imply some centralisation.

Private schemes don't protect us from CAs acting in bad faith or CAs which have been compromised. Public schemes help a lot in these cases, increasingly so the more public they are. Although a certificate might be valid for a short time, evidence of misbehavior is recorded. Public schemes also allow each domain to monitor all the certificates for their domain. Fundamentally, the only entity that can answer the question of whether a certificate is legitimate is the subject of the certificate itself. (See this tale of a Facebook certificate.)

Private schemes have the advantage of protecting the details of internal networks (i.e. not leaking the name www.corp.example.com).

Axis 2: Side channels or not

Revocation checking which calls back to the CA is a side channel. Convergence notaries are too. OCSP stapling is not, because the server provides the client with everything that it needs (assuming that OCSP stapling worked).

Side channels which need to be a hard fail are a problem: it's why revocation checking doesn't actually work. Private networks, hotel networks and server downtime are the issues. Side-channels are also a privacy and performance concern. But they're easier on the server operator and so more likely to be deployed.

In the middle are soft-fail side-channels. These offer limited protection because the connection proceeds even when the side-channel fails. They often queue the certificate up for later reporting.

Axis 3: Clocks or not

OCSP with nonces doesn't need clock sync. OCSP with time stamps does.

Keeping clocks in sync is a problem in itself, but it allows for short lived statements to be cached. It can also be a useful security advantage: Nonces require that a signing key be online because it has to sign a constant stream of requests. With clocks, a single response can be served up repeatedly and the key kept largely off-line. That moves the online key problem to the clock servers, but that's a smaller problem. A compromised clock server key can be handled by querying several concurrently and picking the largest cluster of values.


OCSP today is {private,side-channel,clock}. The 'let's fix OCSP' solutions are typically {private,side-channel,no-clock} (with an option for no-side-channel). Convergence is {mostly-public,side-channel,no-clock}.

I think the answer lies with {public,no-side-channel,clock}, but it's a trek to get there. Maybe something for a future post.

False Start: Brocade broken again

I wrote previously that Brocade had released a firmware update for their ServerIron SSL terminators which fixed an incompatibility with False Start. However, it now appears that they didn't fix the underlying issue, rather they special cased Chrome's current behaviour.

Sadly, due to Chrome implementing a workaround for the BEAST attack, their special casing doesn't work anymore and breaks Chrome 15.

So, if you run a Brocade ServerIron you should contact me ASAP. I'll be adding sites running this hardware to the False Start blacklist for Chrome 15 to allow Brocade to release another firmware update.

Chrome and the BEAST

Thai Duong and Juliano Rizzo today demoed an attack against TLS 1.0's use of cipher block chaining (CBC) in a browser environment. The authors contacted browser vendors several months ago about this and so, in order not to preempt their demo, I haven't discussed any details until now.

Contrary to several press reports, Duong and Rizzo have not found, nor do they claim, any new flaws in TLS. They have shown a concrete proof of concept for a flaw in CBC that, sadly, has a long history. Early reports of the problem date back nearly ten years ago and Bard published two papers detailing the problem.

The problem has been fixed in TLS 1.1 and a workaround for SSL 3.0 and TLS 1.0 is known, so why is this still an issue?

The workaround (prepending empty application data records) is perfectly valid according to the protocol but several buggy implementations of SSL/TLS misbehaved when it was enabled. After Duong and Rizzo notified us, we put the same workaround back into Chrome to see if the state of the Internet had improved in the years since the last attempt. Sadly it was still infeasible to implement this workaround and the change had to be reverted.

Use of TLS 1.1 would also have solved the issue but, despite it being published in 2006, common SSL/TLS libraries still don't implement it. But even if there was widespread deployment of TLS 1.1, it wouldn't have helped avoid problems like this. Due to a different, common bug in SSL 3.0 implementations (nearly 12 years after SSL 3.0 was obsoleted by TLS 1.0), browsers still have to perform SSL 3.0 downgrades to support buggy servers. So even with a TLS 1.1 capable browser and server, an attacker can trigger a downgrade to SSL 3.0 and bypass the protections of TLS 1.1.

Finally, the CBC attacks were believed to be largely theoretical but, as Duong and Rizzo have pointed out today, that's no longer the case.

Initially the authors identified HTML5 WebSockets as a viable method of exploiting the CBC weakness but, due to unrelated reasons, the WebSockets protocol was already in the process of changing in such a way that stopped it. The new WebSockets protocol shipped with Chrome 14 but several plugins have been found to offer features that might allow the attack to be performed.

Duong and Rizzo confirmed that the Java plugin can be used, but Chrome already blocks the execution of Java by default. Other plugins, if installed, can be disabled on the about:plugins page if the user wishes.

The attack is still a difficult one; the attacker has to have high-bandwidth MITM access to the victim. This is typically achieved by being on the same wireless network as the victim. None the less, it's a much less serious issue than a problem which can be exploited by having the victim merely visit a webpage. (Incidentally, we pushed out a fix to all Chrome users for such a Flash bug only a few days ago.)

Also, an attacker with MITM abilities doesn't need to implement this complex attack. SSL stripping and mixed-scripting issues are much easier to exploit and will take continued, sustained effort to address. Duong and Rizzo have highlighted this fact by choosing to attack one of the few HSTS sites.

Thanks to an idea suggested by Xuelei Fan, we have another workaround for the problem which will, hopefully, cause fewer incompatibility problems. This workaround is currently being tested on the Chrome dev and beta channels but we haven't pushed it on the stable channel yet. Since we don't really know if the fix will cause problems it's not something that we want to drop on people without testing. The benefit of a fix to Chrome's TLS stack is also limited as Chrome already uses the newer WebSockets protocol and it doesn't fix problems in plugins.

If it turns out that we've misjudged something we can quickly react, thanks to Chrome's auto-update mechanism.

It's also worth noting that Google's servers aren't vulnerable to this problem. In part due to CBC's history, Google servers have long preferred RC4, a cipher that doesn't involve CBC mode. Mention has also been made of Chrome's False Start feature but, since we don't believe that there are any vectors using Chrome's stack, that's immaterial.

DNSSEC Certificates now in Chrome Stable

A few months back I described DNSSEC authenticated HTTPS in Chrome. This allows sites to use DNSSEC, rather than traditional, certificates and is aimed at sites which currently use no HTTPS, or self-signed certificates. Since Chrome 14 is now stable, all Chrome users now have this experimental feature.

(Also, the serialisation format has been documented.)

Why not Convergence?

In light of recent events, I've had several requests to implement Convergence in Chrome. For those who don't know and, frankly, for anyone interested in SSL, I highly recommend watching Moxie's talk on the subject from this year's Black Hat. You can also check out the project website.

Moxie, having actually thought about the issue and coded something up, has already done a thousand times more to address the problem than almost anyone else. But I don't think that Convergence is something we would add in Chrome:

Although the idea of trust agility is great, 99.99% of Chrome users would never change the default settings. (The percentage is not an exaggeration.) Indeed, I don't believe that an option for setting custom notaries would even meet the standards for inclusion in the preferences UI.

Given that essentially the whole population of Chrome users would use the default notary settings, those notaries will get a large amount of traffic. Also, we have a very strong interest for the notaries to function, otherwise Chrome stops working. Combined, that means that Google would end up running the notaries. So the design boils down to Chrome phoning home for certificate validation. That has both unacceptable privacy implications and very high uptime requirements on the notary service.

It also doesn't address the two problems that Moxie highlights: internal servers and captive portals. It's not clear how either would work in this design, at least without giving up on security and asking the user. (These two problems, captive portals esp, are the bane of many an idea in this area.)

None of the above argues against allowing Convergence as an extension for those who wish to run it. We don't currently have an extension API for controlling certificate decisions and I'm not inherently opposed to one. It would be additional complexity and something that we would have to support in the future, so it's not without costs, but mostly it's not there because nobody has written it and I'm afraid that I don't have any plans to do so.

False Start: past time to fix your servers

A year ago I wrote about False Start in Chrome. In short: Chrome cuts the TLS handshake short to save a round trip in a full handshake. Since then we've posted results that show a 30% drop in SSL handshake latency.

When we enabled False Start by default in Chrome, we also included a list of the very small number of incompatible sites. This list was built into the browser in order to avoid breaking these sites. (See the original post for the reasoning.)

For some time I've been randomly eliminating chunks of that list. Mostly it's been the case that sites have already upgraded. I don't think that they did so specifically with False Start in mind, but that it was just a regular maintainance.

But it's now time that all sites are updated because the list is fading away fast:

DNSSEC authenticated HTTPS in Chrome

Update: this has been removed from Chrome due to lack of use.

DNSSEC validation of HTTPS sites has been hanging around in Chrome for nearly a year now. But it's now enabled by default in the current canary and dev channels of Chrome and is on schedule to go stable with Chrome 14. If you're running a canary or dev channel (and you need today's dev channel release: 14.0.794.0) then you can go to https://dnssec.imperialviolet.org and see a DNSSEC signed site in action.

DNSSEC stapled certificates (and the reason that I use that phrase will become clear in a minute) are aimed at sites that currently have, or would use, self-signed certificates and, possibly, larger organisations that are Chrome based and want certificates for internal sites without having to bother with installing a custom root CA on all the client devices. Suggesting that this heralds the end of the CA system would be utterly inaccurate. Given the deployed base of software, all non-trival sites will continue to use CA signed certificates for decades, at least. DNSSEC signing is just a gateway drug to better transport security.

I'm also going to see how it goes for a while. The most likely outcome is that nobody uses them and I pull the code out in another year's time. There are also some risks:

In theory, it should be hard to get a CA certificate for, say, paypa1.com (note the digit 1 in there). The CA should detect a phishing attempt and deny the certificate. With DNSSEC, there's no such filtering and a phisher could get a green padlock on any site that they can get a DNS name for. It's not clear whether this filtering is actually enforced by all public CAs, nor that phishing is more effective over HTTPS. Additionally, Chrome has anti-phishing warnings built in which is a better layer at which to handle this problem.

In the end, if you want a certificate that identifies a legal entity, rather than a domain name, then you want EV certificates. None the less, if DNSSEC stapled certificates end up being predominantly used for abuse then I'll probably kill them.

It's also possible that lots of sites will use these certificates, triggering warnings in other browsers thus further desensitising those users to such warnings. But that's one of those “good to have” problems and, in this very unlikely situation, there will have to be a push to add wider support.

For site operators

For site operators it's not just a question of dropping in a DNS record and forgetting about it (although nearly). The DNSSEC chain has to be embedded in the certificate and the certificate has to be refreshed, probably in a nightly cron job.

This is why I used the phrase “DNSSEC stapled certificate” above. These certificates are self-signed X.509 certificates with a DNSSEC chain embedded in an extension. The alternative would be to have the client perform a DNSSEC lookup itself. In the long term this may be the answer but, for now, client platforms don't support DNSSEC resolution and I'm in no rush to drop a DNSSEC resolving library into Chrome. We also have data that suggests that ~1% of Chrome (Linux) users can't resolve arbitary DNS record types (because of ‘hotel networks’, firewalls etc). Finally, having the client do a lookup is slower. Since DNSSEC data is signed it doesn't matter how you get it so having the server do the work and cache the result is simplier, faster and dependable.

The DNSSEC stapled data is embedded in an X.509 certificate (as opposed to extending TLS or using a different certificate format) because every HTTPS server will take an X.509 certificate as an opaque blob and it Just Works. All the other possiblilies introduce significant barriers to adoption.

You should also keep in mind that the format of the DNS record may change in the future. I've no current plans to change it but these things are still working themselves out.

Setting it up

First you need a zone that's DNSSEC signed. Setting that up is a little beyond the scope of this blog post but I use BIND with auto-signing as a server and DynDNS as a registra.

Second, you need to clone git://github.com/agl/dnssec-tls-tools.git.

$ openssl genrsa 1024 > privkey.pem
$ openssl rsa -pubout -in privkey.pem > pubkey.pem
$ python ./gencaa.py pubkey.pem
EXAMPLE.COM. 60 IN TYPE257 \# 70 020461757468303e3039060a2b06010401d6790203010…

Take the DNS record that's printed out by gencaa.py, change the domain name (and possibly the TTL) and add it to your zone. If you need to do anything else to make the record accessable to the world, do that now. You should be able to see the record with dig -t type257 example.com.

Now we get the DNSSEC chain and create a certificate with it embedded:

$ python ./chain.py example.com chain
$ gcc -o gencert gencert.c -Wall -lcrypto
$ ./gencert privkey.pem chain > cert.pem

You can check the certificate with openssl x509 -text < cert.pem | less.

CNAMES should work but I haven't implemented DNS wildcards. Also, that code was written to run easily (hence C and Python) rather than being nice code that's going to handle lots of edge cases well. chain.py is just shelling out to dig :)

You now have a key pair (privkey.pem and cert.pem) that can be used by any HTTPS server in the usual way. Keep in mind that DNSSEC signatures expire, often on the order of a week. So you'll need to setup a cron job to download the chain, rebuild the certificate and HUP the HTTPS server.

OpenPGP support in Go

Go is currently my favourite programming language for recreational programming. Mentioning Go appears to send some people into a rage judging by the sort of comments that arise on Hacker News, proggit and so on but, thankfully, this blog doesn't have comments.

Over the past few months I've been (very) slowly building OpenPGP support into Go's crypto library. (You can get my key, DNSSEC signed, via PKA: dig +dnssec -t txt agl._pka.imperialviolet.org). I don't get to use PGP often enough but, even when I have (I used to run an email verifying auto-signer), I've resorted to shelling out to gpg. I understand that GPG's support for acting as a library has improved considerably since then, but half the aim of this is that it's recreational programming.

So, here's a complete Go program using the OpenPGP library: (Note, if you wish to compile these programs you'll need a tip-of-tree Go)

package main

import (

func main() {
  w, _ := armor.Encode(os.Stdout, "PGP MESSAGE", nil)
  plaintext, _ := openpgp.SymmetricallyEncrypt(w, []byte("golang"), nil)
  fmt.Fprintf(plaintext, "Hello from golang.\n")

It'll output a symmetrically encrypted message, like this:



You can feed that into gpg -c and decrypt with the passphrase golang if you like. Since all the APIs are stream based you can process arbitrary length messages without having to fit them into memory.

We can also do public key stuff:

package main

import (

func getKeyByEmail(keyring openpgp.EntityList, email string) *openpgp.Entity {
  for _, entity := range keyring {
    for _, ident := range entity.Identities {
      if ident.UserId.Email == email {
        return entity

  return nil

func main() {
  pubringFile, _ := os.Open("pubring.gpg")
  pubring, _ := openpgp.ReadKeyRing(pubringFile)
  privringFile, _ := os.Open("secring.gpg")
  privring, _ := openpgp.ReadKeyRing(privringFile)

  myPrivateKey := getKeyByEmail(privring, "me@mydomain.com")
  theirPublicKey := getKeyByEmail(pubring, "bob@example.com")

  w, _ := armor.Encode(os.Stdout, "PGP MESSAGE", nil)
  plaintext, _ := openpgp.Encrypt(w, []*openpgp.Entity{theirPublicKey}, myPrivateKey, nil)
  fmt.Fprintf(plaintext, "Hello from golang.\n")

And, of course, it can also decrypt those messages, check signatures etc. The missing bits are ElGamal encryption, support for clearsigning and big bits of functionality around web-of-trust, manipulating keys etc. None the less, Camlistore is already using it and it appears to be working for them.

Public key pinning

Starting with Chrome 13, we'll have HTTPS pins for most Google properties. This means that certificate chains for, say, https://www.google.com, must include a whitelisted public key. It's a fatal error otherwise. Credit goes to my colleague, Chris Evans, for much of this.

The whitelisted public keys for Google currently include Verisign, Google Internet Authority, Equifax and GeoTrust. Thus Chrome will not accept certificates for Google properties from other CAs. As ever, you can look at the source if you wish (search that file for "Verisign"). DigiCert also issues some of our certificates, but we aren't yet enforcing pinning for those domains.

This works with HSTS preloading of Google properties to ensure that, when a user types gmail.com into the address bar, they only get the real Gmail, no matter how hostile the network.

What about MITM proxies, Fiddler etc?

There are a number of cases where HTTPS connections are intercepted by using local, ephemeral certificates. These certificates are signed by a root certificate that has to be manually installed on the client. Corporate MITM proxies may do this, several anti-virus/parental control products do this and debugging tools like Fiddler can also do this. Since we cannot break in these situations, user installed root CAs are given the authority to override pins. We don't believe that there will be any incompatibility issues.

Why public key hashes, not certificate hashes?

In general, hashing certificates is the obvious solution, but the wrong one. The problem is that CA certificates are often reissued: there are multiple certificates with the same public key, subject name etc but different extensions or expiry dates. Browsers build certificates chains from a pool of certificates, bottom up, and an alternative version of a certificate might be substituted for the one that you expect.

For example, StartSSL has two root certificates: one signed with SHA1 and the other with SHA256. If you wished to pin to StartSSL as your CA, which certificate hash would you use? You would have to use both, but how would you know about the other root if I hadn't just told you?

Conversely, public key hashes must be correct:

Browsers assume that the leaf certificate is fixed: it's always the starting point of the chain. The leaf certificate contains a signature which must be a valid signature, from its parent, for that certificate. That implies that the public key of the parent is fixed by the leaf certificate. So, inductively, the chain of public keys is fixed, modulo truncation.

The only sharp edge is that you mustn't pin to a cross-certifying root. For example, GoDaddy's root is signed by Valicert so that older clients, which don't recognise GoDaddy as a root, still trust those certificates. However, you wouldn't want to pin to Valicert because newer clients will stop their chain at GoDaddy.

Also, we're hashing the SubjectPublicKeyInfo not the public key bit string. The SPKI includes the type of the public key and some parameters along with the public key itself. This is important because just hashing the public key leaves one open to misinterpretation attacks. Consider a Diffie-Hellman public key: if one only hashes the public key, not the full SPKI, then an attacker can use the same public key but make the client interpret it in a different group. Likewise one could force an RSA key to be interpreted as a DSA key etc.

(It's possible that a certificate could be reissued with a different SPKI for the same public key. That would be a second sharp edge but, to my knowledge, that has never happened. SPKIs for any public key type should have a distinguished encoding.)

Can I get this for my site?

If you run a large, high security site and want Chrome to include pins, let me know. For everyone else we hope to expose pinning via HSTS, although the details have yet to be worked out. You can experiment with pinning via the HSTS debug UI (chrome://net-internals/#hsts) if you have a new enough version of Chrome.

Smaller than Bloom filters

I've been looking at what would be needed in order to have a global view of CRLs in browsers. At the moment revocation has three problems: privacy (by asking the CA about the status of a certificate you're revealing to them what you're looking at), performance (OCSP checks take hundreds of milliseconds which adds up to thousands of milliseconds when subdomains are in play) and functionality (it doesn't work).

Having a global view of (nearly) all CRLs would mostly solve these issues. We could quickly determine that a certificate is good without leaking information and perform a hard fail revocation check if we believe that a certificate has been revoked. So that begs the question of how to get a global view of CRLs to the client. A Bloom filter would be the obvious choice since we can't accept false negatives (and a Bloom doesn't generate them) but we can cope with false positives (which it does produce).

But we really want to keep this structure small. Every extra KB has to be transferred to hundreds of millions of clients (including mobile devices) and takes up memory on each of those devices. We also have another trick up to play: we don't care much about query time. For a normal Bloom filter, query time is very important but here, if it took several milliseconds, that would be ok.

Given those limitations we can look at data structures other than a traditional Bloom filter and which are smaller and slower.

An optimum Bloom filter takes up n·log(e)ln(1/p) bits (where p is the false positive probability and n is the number of inserted elements). You can't always hit that because you can't have a fraction of a hash function but the theoretical lower bound is just n·log(1/p) bits, a factor if log(e)≅1.44 smaller. Other structures get closer to this bound.

First, there are compressed bloom filters[PDF]. The idea here is that one builds a larger Bloom filter with fewer hash functions and compresses it. In the paper they discuss compressing for transport, but I'm also thinking about it being compressed in memory. An interesting result from the paper is that the optimum number of hash functions for a normal Bloom filter is actually the worst case for a compressed filter.

The optimum number of hash functions for a compressed Bloom is one. Normally that would make the Bloom too large to deal with uncompressed, but if you're not actually going to store the uncompressed version, that's less of an issue.

A Bloom filter with a single hash function requires 1/p bits per element. From that, it's obvious that the density of the filter is p. So a perfect entropy coder (and a range coder is basically perfect) should be able to compress that to just n·1/p·H(p) bits, where H is the binary entropy. Now we can consider how close a compressed Bloom gets to the lower bound that we considered before

A compressed bloom takes 1/p·H(p) per element and the lower bound is ln(1/p). Here's a Wolfram Alpha plot of the expansion factor. The expansion factor goes from about 0.1 to 0.2 over a range of reasonable values of p. For p = 1/1024, it's 0.144. Remember that the expansion factor of a normal Bloom filter is 0.44, so that's quite a bit better.

However, the uncompressed size, even if you're just streaming it, is a problem. For 400K revoked certificates with a 1 in 1024 false positive rate, you're looking at 51MB of uncompressed data. In order to process that you would have to call into the entropy coder 410 million times.

There's an obvious solution to this: split the Bloom into chunks and hash the input in order to find the correct chunk. That works fine, but there's also another answer which may be cleaner: Golomb Compressed Sets. (The idea was first mentioned in this paper, I believe.)

A GCS mirrors the structure of the compressed Bloom filter: we're hashing the elements into a space of size n/p. But, while a compressed Bloom filter treats this as a bitmap, a GCS treats it as a list of values. Since the values are the result of hashing, we can assume that they are uniformly distributed, sort them and build a list of differences. The differences will be geometrically distributed with a parameter of p. Golomb coding is the optimal encoding for geometrically distributed values: you divide by 1/p, encode that in unary then encode the remainder in binary.

A GCS is going to be slightly larger than a compressed bloom filter because the Golomb coding uses a whole number of bits for each value. I don't have an equation for the overhead, but I found it to be between 0.1 and 0.2 bits per set element, which is pretty minimal. It's also an advantage because it means that the user can index the GCS to any degree. With a chunked, compressed Bloom filter the generator has to decide on the chunking and also deal with the inefficiencies resulting from imperfect balance.

But what about that lower bound, can anything hit it? Yes, matrix filters (paper). The idea here is that you hash elements into n values of a finite field. Then, from n such elements you have an n by n matrix where the rows are independent with very high probability. You then perform Gaussian elimination to solve M·b = 1 (where M is the matrix, b is an n column vector and 1 is an n column vector of ones).

After this, b is your matrix filter. Given b, one can hash an element to generate a vector, multiply by b and get the original value back. If the equation was in M then the result has to be 1. Otherwise, the result is a random element from the field. If you use GF(210) as the field, then the false positive probability is 1 in 1024.

As an example, consider the case of a two element set. You would hash these elements into equations that define lines in the plane and the matrix filter is the intersection of these lines. Querying the filter means hashing the element to form a line and seeing if it intersects that point.

Matrix filters can also be used to associate values with the elements. In this case, for an element in the set you'll always get the correct value and for anything else you'll get a random value. (So you don't get to know if the element was in the set in this case.)

Matrix filters hit the lower bound, but have some issues. First, in order to query the filter you have to read and process the whole filter, so you will have to chunk it. Secondly, if your false positive rate is not of the form 1/2n, then the field arithmetic is slower. But I'll deal with the most important problem, from my point of view, next.

Back to the motivation for a moment: we're looking to sync a global view of CRLs to browsers. The initial download is one thing, but we also have to push updates frequently (probably daily). That means that delta encoding is terribly important.

Delta compressing a compressed Bloom filter or a GCS is pretty obvious with a range coder in hand. Additions and deletions from the set are just additions and deletions of bits or values, respectively. But changes to a matrix filter mean rewriting everything. Consider the example of lines in a plane: changing one of the lines moves the intersection point in both dimensions.

So, in order to add elements you would either have to retransmit the chunks that they fall into or send an additional chunk (that would have to be checked every time). Deletions can't be realised without retransmitting a whole chunk so, over time, the useful fraction of elements in a chunk would fall. Since a GCS is only about 15% bigger, it wouldn't take long for the matrix filter to be worse off.

So, although matrix filters are beautiful, I think a GCS works best. I suspect that there's a deep reason why a structure which hits the lower size bound cannot be delta compressed, but I'm too dumb to figure it out if there is.

In the process of playing with this, I implemented everything to some degree. I'll throw the code on github at some point.

Multi-prime RSA trade offs

(This is a technical post: casual readers should skip it.)

I couldn't find data on the speed/security trade offs of multi-prime RSA anywhere, so I gathered the data myself and hopefully this blog post can be found by people in the future.

I used the equations from Lenstra (Unbelievable security: Matching AES security using public key systems, in ASIACRYTPT 2001) and scaled them so that two prime, 1024 bit RSA is 80 bits. The speed numbers are from OpenSSL for a single core of a Xeon E5520 at 2.3GHz.

I'm assuming a 2048-bit modulus. The blue, horizontal lines are the estimated difficulty of the NFS for 2048 and 1024-bit composities. The blue curve is the estimated difficulty of factoring a multi-prime composite using ECM. Really you don't want to choose so many primes at this drops below the NFS level.

image/svg+xml Produced by GNUPLOT 4.2 patchlevel 6 0 500 1000 1500 2000 2 3 4 5 6 7 8 0 50 100 150 ops/s security (bits) Num primes Security Speed

A couple of useful observations from this data: For 2048-bits, 3 primes is the most you want (this is confirmed by table 1 of this paper). Also, if your CA is requiring a 2048-bit modulus, you can use seven primes to get basically the same speed and security as a 1024-bit key.

This is the same data, in table form:

1024-bits (80-bit NFS security)

nops/sECM security

2048-bits (107-bit NFS security)

nops/sECM security

4096-bits (144-bit NFS security)

nops/sECM security

Revocation doesn't work

When an HTTPS certificate is issued, it's typically valid for a year or two. But what if something bad happens? What if the site loses control of its key?

In that case you would really need a way to invalidate a certificate before it expires and that's what revocation does. Certificates contain instructions for how to find out whether they are revoked and clients should check this before accepting a certificate.

There are basically two methods for checking if a certificate is revoked: certificate revocation lists (CRLs) and OCSP. CRLs are long lists of serial numbers that have been revoked while OCSP only deals with a single certificate. But the details are unimportant, they are both methods of getting signed and timestamped statements about the status of a certificate.

But both methods rely on the CA being available to answer CRL or OCSP queries. If a CA went down then it could take out huge sections of the web. Because of this, clients (and I'm thinking mainly of browsers) have historically been forgiving of an unavailable CA.

But an event this week gave me cause to wonder how well revocation actually works. So I wrote the the world's dumbest HTTP proxy. It's just a convenient way to intercept network traffic from a browser. HTTPS requests involve the CONNECT method, which is implemented. All other requests (including all revocation checks) simply return a 500 error. This isn't even as advanced as Moxie's trick of returning 3.

To be clear, the proxy is just for testing. An actual attack would intercept TCP connections. The results:

Firstly, IE 8 on Windows 7:

No indication of a problem at all even though the revocation checks returned 500s. It's even EV.

(Aside: I used Wireshark to confirm that revocation checks weren't bypassing the proxy. It's also the case that SChannel can cache revocation information. I don't know how to clear this cache, so I simply used a site that I hadn't visited before. Also confirming that a cache wasn't in effect is the fact that Chrome uses SChannel to verify certificates and Chrome knew that there was an issue..)

Firefox 3.6 on Windows 7, no indication:

Update: George Macon pointed out that Firefox removes the EV indication from this site. I'm not a regular Firefox user so I'm afraid that I didn't notice:

Chrome 12 on Windows 7:

There's something at least! We can click on it...

So Chrome has an indication but it's the same indication as mixed-content and I suspect that rather a lot of people will ignore it. There is one saving grace, Chrome implements HSTS and for HSTS sites, revocation failure is fatal:

Update: I changed this behaviour. See http://www.ietf.org/mail-archive/web/websec/current/msg00296.html

On other platforms, the story is much the same. Safari doesn't indicate that anything is wrong, nor does Firefox 4 on OS X, nor Firefox 3.6 on Linux. Chrome on Mac is the same as Windows, but on Linux doesn't indicate and doesn't protect HSTS sites. (Chrome Linux's behaviour is due to a limitation in NSS as I understand it.).

So, what's the effect? Well it depends where an attacker is. If the attacker is spoofing a site and is situated close to the site then they can attack all users, because they can get all the traffic destined for the site. However, such an attacker probably can't intercept traffic to the CA's servers, so revocation will actually work because the users will receive a firm ‘revoked’ message from the CA. If the attacker is close to the user, then they can only attack a smaller number of users, but they can intercept traffic to the CA and thus defeat revocation. Attacks in Tunisia and only open WiFi networks are the sort of attacks which can defeat revocation.

So should browsers be stricter about revocation checking? Maybe, but it does mean that a CA outage would disable large parts of the web. Imagine if Verisign corrupted their revocation database and were down for six hours while they rebuilt it. An global outage of large parts of the HTTPS web like that would seriously damage the image of web security to the point where sites would think twice about using HTTPS at all.

(You can configure Firefox to be strict about checking if you wish: security.OCSP.require in about:config.)

A much better solution would be for certificates to only be valid for a few days and to forget about revocation altogether. This doesn't mean that the private key needs to change every few days, just the certificate. And the certificate is public data, so servers could just download their refreshed certificate over HTTP periodically and automatically (like OCSP stapling). Clients wouldn't have to perform revocation checks (which are very complex and slow), CAs wouldn't have to pay for massive, DDoS proof serving capacity and revocation would actually work. If the CA went down for six hours, nobody cares. Only if the CA is down for days is there a problem. If you want to “revoke” a certificate, just stop renewing it.

HSTS UI in Chrome

HSTS is designed to address the fact that HTTP is the default protocol on the web. You might run a secure site that redirects everyone to HTTPS, but your users will just type in foo.com and that initial HTTP request is vulnerable to manipulation and SSL stripping attacks.

HSTS allows sites to advertise that they are HTTPS only. See the Chromium site page for more details.

However, it's difficult to test HSTS. Chrome's HSTS database stores only the hashes of sites (which may not have been the right choice, but I don't see sufficient motivation to change it at the moment) so it's hard to edit by hand. There are tools like craSH's Chrome-STS to help edit the database, but it's too hard for the average developer.

So I've added a debugging UI to Chrome to query, add and remove entries from the HSTS database. You have to type about:net-internals into the address bar and click the HSTS tab. You also need a version of Chrome after r75282. Today that means a trunk build but it should be in the dev channel release next week. Until then, here's a screenshot:

So, if you want to see if your site will break before deploying HSTS, you can add an entry locally for it and find out. Modifications persist on disk and have a expiry of 1000 days. You can also use it as a way to elect to always access sites via HTTPS (i.e. mail.google.com).

Origin poisoning

Mixed-scripting is when an HTTPS page loads resources like Javascript, CSS or plugins over HTTP. It's a subset of `mixed-content' but it's the really dangerous subset: by controlling those types of resources a network attacker can take control of the whole page rather than just, say, a single image.

Chrome, unlike other browsers, tracks mixed-scripting for origins rather than for pages. This means that, if page x triggers mixed-scripting and you follow a link to another page, y, in the same origin, we'll decorate y with mixed-scripting warnings even if it's clean when considered in isolation. This is because Javascript can control other pages in the same origin, see this paper.

Chrome's detection actually errs a little on the side of too many warnings. Strictly speaking, if there's only a single page open in the origin and you navigate it, then there's nowhere for evil Javascript to hide and so the next page can be clean. Chrome, rather, considers an origin to be poisoned once it has triggered mixed-scripting and that taint is only lifted once that renderer process has shutdown.

(Although there's also an argument to be made that Chrome's behavior is fine. Consider this case: you log into a web email client and the front page has mixed scripting. Then you click a link to compose an email. The compose page might be free of mixed-scripting and all the evil Javascript might have been flushed. But the mixed-scripting could have switched your cookies and logged you in as the attacker's account. Now, when you send the email, it'll appear in the attacker's `Sent Mail'.)

But that's all background. The problem is that developers get confused about these mixed-scripting warnings and invariably blame Chrome. If you have a mixed-scripting issue occur on a page which redirects, or in an HTTPS iframe on an HTTP site, then the Javascript console doesn't help you. If can be very difficult to figure out what the problem is.

So I've landed a patch in Chrome which logs mixed-scripting to the debug log no matter where it occurs. Just get a build after r74119 (see about:version) and follow the instructions to enable debugging logging.

Still not computationally expensive

F5 have put up a blog post explaining why my previous statements about SSL's computational costs are a myth. I'm glad that the idea is getting out there!

Keep in mind, however, that F5 want to sell you SSL hardware and that the blog post is a marketing piece. (I've never used F5 products but, from the client side, I've never found any problems with them either.)

Some assertions in the text are simple and wrong, so I'll just point these out:

Certificate lengths

It's true that NIST have recommended that 1024-bit certificates should be phased out by 2013. CA certificates should be 2048-bit now and we need to work to remove the 1024-bit ones.

But even in 2013 it's still going to take tens of millions of dollars of computer hardware a year to factor a 1024-bit RSA key. If you're the sort of organisation which is considering deploying HTTPS do you think that your attackers are going to do that, or are they going to bribe someone to steal the private key from the servers?

Likewise, SSL hardware will probably make your key harder to steal via a software exploit, but keys can be revoked the problem dealt with. There are some organisations where hardware protected keys make sense, but for 99% of sites, key material isn't what's important. It would be far more damaging for customer data to be dumped on the web and SSL hardware doesn't save you there.

As we go towards 2013, CAs will try to issue fewer 1024-bit certificates and 2048-bit certificates are 5x slower. But it only takes one CA to start issuing ECDSA certificates along with a 2048-bit RSA one and the problem is much less vexing. Browsers largely support it and you can serve the ECDSA certificate to the ones which do and the RSA to the rest. P256-ECDSA is about as fast as 1024-bit RSA. The only problem is that people don't know to demand it from their CAs so the CAs don't do it yet.


RC4 has good key agility, it's a stream cipher (which saves bytes on the wire), it's quick and very well analysed. It's not the strongest cipher in the world, but it's significantly stronger than some other parts of the SSL ecosystem. With overwhelming probability, you are not the sort of organisation that needs to worry about attackers brute-forcing your cipher.

All together now...

SSL is just not that computationally expensive any more. Here are the real costs of HTTPS deployment these days:

The F5 article does mention the first of these, but SSL hardware doesn't help with either of them.

All sites should deploy HTTPS because attacks like Firesheep are too easy to do. Even sites where you don't login should deploy HTTPS (imagine the effect of spoofing news websites at a major financial conference to headline “Market crashes”). You should use HSTS to stop sslstrip. But you are probably not the sort of organisation which needs to worry about multi-million dollar attacks aimed at factoring your key.

Unfortunate current practices for HTTP over TLS

(For amusement value.)

Network Working Group                                         A. Langley
Internet-Draft                                                Google Inc
Expires: July 5, 2011                                           Jan 2011

            Unfortunate current practices for HTTP over TLS


   This document describes some of the unfortunate current practices
   which are needed in order to transport HTTP over TLS on the public

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on July 5, 2011.

Copyright Notice

   Copyright (c) 2011 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Handshake message fragmentation  . . . . . . . . . . . . . . .  4
   3.  Protocol Fallback  . . . . . . . . . . . . . . . . . . . . . .  5
   4.  More implementation mistakes . . . . . . . . . . . . . . . . .  6
   5.  Certificate Chains . . . . . . . . . . . . . . . . . . . . . .  7
   6.  Insufficient Security  . . . . . . . . . . . . . . . . . . . .  8
   7.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . .  9
   8.  Normative References . . . . . . . . . . . . . . . . . . . . . 10
   Appendix A.  Changes . . . . . . . . . . . . . . . . . . . . . . . 11
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.  Introduction

   HTTP [RFC2616] is one of the most common application level protocols
   transported over TLS [RFC5246].  (This combination is commonly known
   as HTTPS based on the URL scheme used to indicate it.)  HTTPS clients
   have to function with a huge range of TLS implementations, some of
   higher quality than others.  This text aims to document some of the
   behaviours of existing HTTPS clients that are designed to ensure
   maximum interoperability.

   This text should not be taken as a recommendation that future HTTPS
   clients adopt these behaviours.  The security implications of each
   need to be carefully considered by each implementation.  However,
   these behaviours are common and the authors consider it better to
   document the state of practice than to simply wish it were otherwise.

2.  Handshake message fragmentation

   Many servers will fail to process a handshake message that spans more
   than one record.  These servers will close the connection when they
   encounter such a handshake message.  HTTPS clients will commonly
   ensure against that by either packing all handshake messages in a
   flow into a single record, or by creating a single record for each
   handshake message.

3.  Protocol Fallback

   Despite it being nearly twelve years since the publication of TLS 1.0
   [RFC2246], around 3% of HTTPS servers will reject a valid TLS
   "ClientHello".  These rejections can take the form of immediately
   closing the connection or a fatal alert.  Intolerance to the
   following has been observed:

      Advertising version TLS 1.0.

      Advertising a TLS version greater than TLS 1.0 (around 2% for 1.1
      or 1.2, around 3% for greater than 1.2).

      Advertising a version greater than 0x03ff (around 65% of servers)

      The presence of any extensions (around 7% of servers)

      The presence of specific extensions ("server_name" and
      "status_request" intolerance has been observed, although in very
      low numbers).

      The presence of any advertised compression algorithms

   Next, some servers will misbehave after processing the "ClientHello"
   message.  Negotiating the use of "DEFLATE" compression can result in
   fatal "bad_record_mac", "decompression_failure" or
   "decryption_failed" alerts.  Notably, OpenSSL prior to version 0.9.8c
   will intermittently fail to process compressed finished messages due
   to a work around of a previous padding bug.

   Lastly, some servers will negotiate the use of SSLv3 but select a
   TLS-only cipher suite.

   In all these cases, HTTPS clients will often enter a fallback mode.
   The connection is retried using only SSLv3 and without advertising
   any compression algorithms.  (This is obviously an easy downgrade
   attack.)  Also, the fallback can be triggered by transient network
   problems, which often manifest as an abruptly closed connection.
   Since SSLv3 does not provide any means of Server Name Indication
   [RFC3546], the fallback connection can use the wrong certificate
   chain, resulting in a very surprising certificate error.

4.  More implementation mistakes

   Non-fatal errors in version negotiation also occur.  Some 0.2% of
   servers use the version from the record header.  Around 0.6% of
   servers require that the version in the "ClientHello" and record
   header match in order to respect the version in the "ClientHello".  A
   very low number of servers echo whatever version the client

   In the event that the client supports a higher protocol version than
   the server, about 0.4% of servers require that the RSA
   "ClientKeyExchange" message include the server's protocol version.

   Some 30% of servers don't check the version in an RSA
   "ClientKeyExchange" at all.

5.  Certificate Chains

   Certificate chains presented by servers will commonly be missing
   intermediate certificates, have certificates in the wrong order and
   will include unrelated, superfluous certificates.  Servers have been
   observed presenting more than ten certificates in what we assume is a
   drive-by shooting approach to including the correct intermediate

   In order to validate chains which are missing certificates, some
   HTTPS clients will collect intermediate certificates from other
   servers.  Clients will commonly put all the presented certificates
   into a set and try to validate a chain assuming only that the first
   certificate is the leaf.

6.  Insufficient Security

   Some 65% of servers support SSLv2 (beyond just supporting the
   handshake in order to upgrade to SSLv3 or TLS).  HTTPS clients will
   typically not support SSLv2, nor send SSLv2 handshakes by default.
   Of those servers, 80% support the export ciphersuites.  (Although
   about 3% of those servers negotiate weak ciphersuites only to show a

   Some servers will choose very small multiplicative group sizes for
   their ephemeral Diffie-Hellman exchange (for example, 256-bits).
   Some HTTPS clients will reject all multiplicative group sizes smaller
   than 512-bits while others will retry after demoting DHE ciphersuites
   in their "ClientHello".

7.  Acknowledgements

   Yngve Pettersen made significant contributions and many of the
   numbers in this document come from his scanning work.  Other numbers
   were taken from Ivan Ristic's SSL Survey.

   Thanks to Wan Teh Chang for reviewing early drafts.

8.  Normative References

   [RFC2246]  Dierks, T. and C. Allen, "The TLS Protocol Version 1.0",
              RFC 2246, January 1999.

   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

   [RFC3546]  Blake-Wilson, S., Nystrom, M., Hopwood, D., Mikkelsen, J.,
              and T. Wright, "Transport Layer Security (TLS)
              Extensions", RFC 3546, June 2003.

   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

OpenSSH 5.7 with ECC support

OpenSSH 5.7 has been released and its major notable new feature is support for ECDH key agreement and ECDSA host and user keys.

What a difference a prime makes

In a previous post I covered some of the issues with implementing ECC operations. I also mentioned the problems with the number of terms in the P256 prime.

Having spent a fair amount of time optimising OpenSSL's elliptic curve code, here are the current speeds:

image/svg+xml Produced by GNUPLOT 4.2 patchlevel 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 80 100 120 140 160 180 200 220 240 260 1024-bit DH 2048-bit DH P224 curve25519 P256 P521 Security level (bits) Operations / core second

(The graph is inline SVG. Can't see it? Get a better browser.)

P224 and P256 are not, fundamentally, very different. However, P256 has a nasty prime formation, as I explained previously, which kills the speed. Sadly, if you want to support the existing fleet of browsers, P256 is your fastest option.

(The multiplicative DH is measured using OpenSSL's code. curve25519 is my own code, based on djb's. P224 is Emilia Kasper's, inspired by curve25519. P521 is my own code based on Emilia's. P256 is my own code, also based on Emilia's, with a very different form from in an attempt to get it running faster.)

(The measurements are taken on a single core of my 2.3GHz Xeon E5520. They are a scalar multiplication of the base point followed by a scalar multiplication of the result. They don't include point validation. The OpenSSL code involves additional overhead from the BIGNUM conversions, however I consider that fair game because you have to do them if you want to use the code.)

Elliptic curves and their implementation (pointer)

If you read this via Google Reader (or, possibly, other feed readers) then the last post won't have appeared, possibly because of the inline SVG (getting cut by the bleeding edge again).

So, click to read Elliptic curves and their implementation.

Elliptic curves and their implementation

So what's an elliptic curve? Well, for starters, it's not an ellipse. An elliptic curve is a set of points on a plane which satisfy an equation of the form y2 = x3 + ax + b.

As an example, here's the elliptic curve y2 = x3 - 3x + 3: