ImperialViolet

No, don't enable revocation checking (19 Apr 2014)

Revocation checking is in the news again because of a large number of revocations resulting from precautionary rotations for servers affected by the OpenSSL heartbeat bug. However, revocation checking is a complex topic and there's a fair amount of misinformation around. In short, it doesn't work and you are no more secure by switching it on. But let's quickly catch up on the background:

Certificates bind a public key and an identity (commonly a DNS name) together. Because of the way the incentives work out, they are typically issued for a period of several years. But events occur and sometimes the binding between public key and name that the certificate asserts becomes invalid during that time. In the recent cases, people who ran a server that was affected by the heartbeat bug are worried that their private key might have been obtained by someone else and so they want to invalidate the old binding, and bind to a new public key. However, the old certificates are still valid and so someone who had obtained that private key could still use them.

Revocation is the process of invalidating a certificate before its expiry date. All certificates include a statement that essentially says “please phone the following number to check that this certificate has not been revoked”. The term online revocation checking refers to the process of making that phone call. It's not actually a phone call, of course, rather browsers and other clients can use a protocol called OCSP to check the status of a certificate. OCSP supplies a signed statement that says that the certificate is still valid (or not) and, critically, the OCSP statement itself is valid for a much shorter period of time, typically a few days.

The critical question is what to do in the event that you can't get an answer about a certificate's revocation status. If you reject certificates when you can't get an answer, that's called hard-fail. If you accept certificates when you can't get an answer that's called soft-fail.

Everyone does soft-fail for a number of reasons on top of the general principle that single points of failure should be avoided. Firstly, the Internet is a noisy place and sometimes you can't get through to OCSP servers for some reason. If you fail in those cases then the level of random errors increases. Also, captive portals (e.g. hotel WiFi networks where you have to “login” before you can use the Internet) frequently use HTTPS (and thus require certificates) but don't allow you to access OCSP servers. Lastly, if everyone did hard-fail then taking down an OCSP service would be sufficient to take down lots of Internet sites. That would mean that DDoS attackers would turn their attention to them, greatly increasing the costs of running them and it's unclear whether the CAs (who pay those costs) could afford it. (And the disruption is likely to be significant while they figure that out.)

So soft-fail is the only viable answer but it has a problem: it's completely useless. But that's not immediately obvious so we have to consider a few cases:

If you're worried about an attacker using a revoked certificate then the attacker first must be able to intercept your traffic to the site in question. (If they can't even intercept the traffic then you didn't need any authentication to protect it from them in the first place.) Most of the time, such an attacker is near you. For example, they might be running a fake WiFi access point, or maybe they're at an ISP. In these cases the important fact is that the attacker can intercept all your traffic, including OCSP traffic. Thus they can block OCSP lookups and soft-fail behaviour means that a revoked certificate will be accepted.

The next class of attacker might be a state-level attacker. For example, Syria trying to intercept Facebook connections. These attackers might not be physically close, but they can still intercept all your traffic because they control the cables going into and out of a country. Thus, the same reasoning applies.

We're left with cases where the attacker can only intercept traffic between a user and a website, but not between the user and the OCSP service. The attacker could be close to the website's servers and thus able to intercept all traffic to that site, but not anything else. More likely, the attacker might be able to perform a DNS hijacking: they persuade a DNS registrar to change the mapping between a domain (example.com) and its IP address(es) and thus direct the site's traffic to themselves. In these cases, soft-fail still doesn't work, although the reasoning is more complex:

Firstly, the attacker can use OCSP stapling to include the OCSP response with the revoked certificate. Because OCSP responses are generally valid for some number of days, they can store one from before the certificate was revoked and use it for as long as it's valid for. DNS hijackings are generally noticed and corrected faster than the OCSP response will expire. (On top of that, you need to worry about browser cache poisoning, but I'm not going to get into that.)

Secondly, and more fundamentally, when issuing certificates a CA validates ownership of a domain by sending an email, or looking for a specially formed page on the site. If the attacker is controlling the site, they can get new certificates issued. The original owners might revoke the certificates that they know about, but it doesn't matter because the attacker is using different ones. The true owners could try contacting CAs, convincing them that they are the true owners and get other certificates revoked, but if the attacker still has control of the site, they can hop from CA to CA getting certificates. (And they will have the full OCSP validity period to use after each revocation.) That circus could go on for weeks and weeks.

That's why I claim that online revocation checking is useless - because it doesn't stop attacks. Turning it on does nothing but slow things down. You can tell when something is security theater because you need some absurdly specific situation in order for it to be useful.

So, for a couple of years now, Chrome hasn't done these useless checks by default in most cases. Rather, we have tried a different mechanism. We compile daily lists of some high-value revocations and use Chrome's auto-update mechanism to push them to Chrome installations. It's called the CRLSet and it's not complete, nor big enough to cope with large numbers of revocations, but it allows us to react quickly to situations like Diginotar and ANSSI. It's certainly not perfect, but it's more than many other browsers do.

A powerful attacker may be able to block a user from receiving CRLSet updates if they can intercept all of that user's traffic for long periods of time. But that's a pretty fundamental limit; we can only respond to any Chrome issue, including security bugs, by pushing updates.

The original hope with CRLSets was that we could get revocations categorised into important and administrative and push only the important ones. (Administrative revocations occur when a certificate is changed with no reason to suspect that any key compromise occurred.) Sadly, that mostly hasn't happened. Perhaps we need to consider a system that can handle much larger numbers of revocations, but the data in that case is likely to be two orders of magnitude larger and it's very unlikely that the current CRLSet design is still optimal when the goal moves that far. It's also a lot of data for every user to be downloading and perhaps efforts would be better spent elsewhere. It's still the case that an attacker that can intercept traffic can easily perform an SSL Stripping attack on most sites; they hardly need to fight revoked certificates.

In order to end on a positive note, I'll mention a case where online revocation checking does work, and another, possible way to solve the revocation problem for browsers.

The arguments above started with the point that an attacker using a revoked certificate first needs to be able to intercept traffic between the victim and the site. That's true for browsers, but it's not true for code-signing. In the case where you're checking the signature on a program or document that could be distributed via, say, email, then soft-fail is still valuable. That's because it increases the costs on the attacker substantially: they need to go from being able to email you to needing to be able to intercept OCSP checks. In these cases, online revocation checking makes sense.

If we want a scalable solution to the revocation problem then it's probably going to come in the form of short-lived certificates or something like OCSP Must Staple. Recall that the original problem stems from the fact that certificates are valid for years. If they were only valid for days then revocation would take care of itself. (This is the approach taken by DNSSEC.) For complex reasons, it might be easier to deploy that by having certificates that are valid for years, but include a marker in them that indicates that an OCSP response must be provided along with the certificate. The OCSP response is then only valid for a few days and the effect is the same (although less efficient).