Letter to 20 years ago (06 Sep 2020)
I noticed that I have not posted anything here in 2020! There's a bunch of reasons for this: the work I'm doing at the moment does not lend itself so well to blog posts, and life intervenes, leaving less time for personal projects. But in order to head off the risk that I'll post nothing at all this year I pulled something from one of my notebooks. 2020 is a round number so I decided to do some reflection and this was a letter that I imagined writing to myself 20 years ago. It is very much a letter to me! The topics are going to be quite specific and if you weren't paying attention to the computing industry in the year 2000 I’m not sure how much of it will make sense. But these are the points that I think me of 20 years ago would have wondered about.
You must be thinking that computers will be crazy fast by now. Yes…ish. It's complicated, and that's going to be a theme here. You've been hearing from Intel that the NetBurst chips will hit 10GHz in a few years, so with another few doublings what will have by now? 50GHz? Actually common values are around 3.5GHz. Some reach 5GHz, but only in bursts. Intel never hit 10GHz and nor did anybody else. It’s better than it sounds: instructions per clock are up a lot, so each cycle is worth more. (Although maybe we'll get to some of the issues that caused!) More importantly, all systems are multiprocessor now. It's physically a single chip, but inside is often 8- to 32-way SMT. Yep, that's cool. And yep, it only helps for certain sorts of workloads. Multithreaded programming is not going away.
Memory? 10s of gigabytes is common. Hard drives? It's nearly all flash now. You can still buy hard drives and they’re huge and cheap, but the speed of flash is pretty sweet. Computers really are quite substantially faster — don't be too put off by the clock speeds.
Your day-to-day life is a bunch of xterms and a web browser. Nothing's changed; you are dramatically underestimating the importance of path dependence. Terminals are still emulating a fancy VT-100 and sometimes they get messed up and need a reset. No fixes there. It's still bash or zsh; nearly unchanged from your time. The kernel has been fixed a little: you can now get a handle to a process, so no more PID races. You can open a file relative to a directory descriptor and you can create an unlinked file in a directory and link it later. Yes it's good that these things are possible now, but it is not a fundamental change and it took a long time. Actually you know what? Windows grew a much smarter shell, leapfrogging Linux in several respects. They had hardly moved forward since DOS so it was easier there, perversely.
So innovation must have happened higher up where there was new ground and little legacy, right? What about the semantic web? How did that turn out? Not well. We don't have lots of data in machine-readable formats and fancy GUIs so that anyone can create automation. Information is somewhere between impossible and a huge pain to access. You’ve read The Dilbert Future by now and its ‘confusopoly’ concept is much closer to the mark. The Semantic Web stuff failed so badly that nobody even tries any longer. (I'm afraid Scott Adams won’t seem so wholesome in the future either.) The closest you'll get is that your web browser can fill out your name, address, and credit card details. And it has to work really hard to do that because there’s almost no assistance from web pages. Go find Weaving the Web and throw it away.
Something more positive: bandwidth! You are using a dial-up that tops out at 5 KB/s and charges by the minute. You use a local proxy that keeps a copy of everything so that viewed pages are available offline and it lets you mark missing pages for batch fetching to reduce the cost. This problem is now solved. You can assume that any house in a city can get an always-on, several 10s of Mb/s connection. It's not as cheap as it could be but it's a standard household expense now. (Note: past me doesn't live in the US! —agl.) Also, everyone carries an impossibly fancy PDA that has that level of connection wirelessly and everywhere. I don't need to equivocate here, connectivity is solved in the sorts of places you're likely to live. But … there's a second edge to that sword. This connectivity can be a bit … much? There are some advantages to having the internet be stationary, metered, and behind 30 seconds of banshee wailing and static. Imagine your whole social life getting run through IRC, and that you're always connected. It’s tough to explain but there's a problem. But these PDAs? They have GPS and maps. Nobody gets lost anymore. Nobody carries paper street maps in their car. Connectivity can be pretty sweet.
This next bit is going to upset you a little: the whole Palladium / trusted boot stuff never took off on the desktop, but these PDAs are pretty locked down. One type of them is completely locked down and you can’t run non-approved software. The other will make you jump through hoops and, even then, you can't access the data of other programs. On the latter sort you can install a completely custom OS most of the time, but there's attestation and some things won't cooperate. This is still playing out and people are fighting over the details (because of money, of course). It remains a concern, but you underestimate the benefits of this sort of system. Your idea that people should own their own computers because they’re critical tools isn't wrong, but it is elitist. For the vast majority of people, their desktops degrade into a fragile truce with a whole ecosystem of malware and near-malware. Maybe it's their “fault” for having installed it, but these PDAs are so popular, in part, because they're hard to screw up. Bad stuff does get through the approval process, but it cannot mess things up to the wipe-and-reinstall level that desktops reach. The jury is still out about whether we will regret this, but you're wrong about the viability of giving people Windows XP and getting a good result.
Back on a positive note: the music industry switched to a $15 a month stream-whatever-you-want model and it works fine. You were completely right about this. Music still exists and it still pays a few at the top large sums and the rest very little. The music industry itself didn't sort this out though, other companies did it for them. What you're missing is that you’re not taking things far enough: companies also did this for TV and (many) movies. There are still rips of this stuff on BitTorrent, but it's not a live issue because people pay the subscription for the ease, by and large. In fact, access to scientific papers is a hotter issue now!
Basically, rates of change are really uneven.
This is the third in a series of posts about running experiments on post-quantum confidentiality in TLS. The first detailed experiments that measured the estimated network overhead of three families of post-quantum key exchanges. The second detailed the choices behind a specific structured-lattice scheme. This one gives details of a full, end-to-end measurement of that scheme and a supersingular isogeny scheme, SIKE/p434. This was done in collaboration with Cloudflare, who integrated Microsoft's SIKE code into BoringSSL for the tests, and ran the server-side of the experiment.
Google Chrome installs, on Dev and Canary channels, and on all platforms except iOS, were randomly assigned to one of three groups: control (30%), CECPQ2 (30%), or CECPQ2b (30%). (A random ten percent of installs did not take part in the experiment so the numbers only add up to 90.) CECPQ2 is the hybrid X25519+structured-lattice scheme previously described. CECPQ2b is the name that we gave to the combination of X25519 and the SIKE/p434 scheme.
Because optimised assembly implementations are labour-intensive to write, they were only available/written for AArch64 and x86-64. Because SIKE is computationally expensive, it wasn’t feasible to enable it without an assembly implementation, thus only AArch64 and x86-64 clients were included in the experiment and ARMv7 and x86 clients did not contribute to the results even if they were assigned to one of the experiment groups.
Cloudflare servers were updated to include support for both CECPQ2 and CECPQ2b, and to support an empty TLS extension that indicated that they were part of the experiment. Depending on the experiment group, Chrome would either offer CECPQ2, CECPQ2b, or just non-post-quantum options, in its TLS 1.3 handshake, along with the signaling extension to indicate which clients were part of the control group. Measurements were taken of how long TLS handshakes took to complete using Chrome’s metrics system. Chrome knew which servers were part of the experiment because they echoed the signaling extension, thus all three groups were measuring handshake duration against the same set of servers.
After this phase of the trial was complete, client-side measurements were disabled and Chrome Canary was switched to a mode where it randomly picked one of CECPQ2, CECPQ2b, or neither to offer. This enabled some additional, server-side measurements to ensure that nothing unexpected was occuring.
(Cloudflare has a significantly more detailed write up of this experiment.)
We’re aware of a couple of biases and these need to be kept in mind when looking at the results. Firstly, since ARMv7 and x86 platforms were excluded, the population was significantly biased towards more powerful CPUs. This will make supersingular isogenies look better. Also, we’ve seen from past experiments that Canary and Dev Chrome users tend to have worse networks than the Chrome user population as a whole, and this too will tend to advantage supersingular isogenies since they require less network traffic.
Here are histograms of client-side results, first from Windows (representing desktops/laptops) and from Android (representing mobile devices):
From the histograms we can see that the CECPQ2b (SIKE) group shifts visibly to the right (i.e. slower) in both cases. (On Android, a similar but smaller shift is seen for CECPQ2.) Despite the advantages of removing the slower clients and experimenting with worse-than-usual networks, the computational demands of SIKE out-weigh the reduced network traffic. Only for the slowest 5% of connections are the smaller messages of SIKE a net advantage.
Cloudflare have a much more detailed analysis of the server-side results, which are very similar.
While there may be cases where the smaller messages of SIKE are a decisive advantage, that doesn’t appear to be the case for TLS, where the computational advantages of structured lattices make them a more attractive choice for post-quantum confidentiality.
Username (and password) free login with security keys (10 Aug 2019)
Most readers of this blog will be familiar with the traditional security key user experience: you register a token with a site then, when logging in, you enter a username and password as normal but are also required to press a security key in order for it to sign a challenge from the website. This is an effective defense against phishing, phone number takeover, etc. But modern security keys are capable of serving the roles of username and password too, so the user experience can just involve clicking a login button, pressing the security key, and (perhaps) entering a locally-validated PIN if the security key doesn't do biometrics. This is possible with the recently released Chromium 76 and also with Edge or Firefox on current versions of Windows.
On the plus side, this one-button flow frees users from having to remember and type their username and password for a given site. It also avoids sites having to receive and validate a password, potentially avoiding both having a password database (which, even with aggressively slow hashes, will leak many users' passwords if disclosed), and removing any possibility of accidentally logging the plaintext values (which both Google and Facebook have done recently). On the negative side, users will need a modern security key (or Windows Hello-enabled computer) and may still need to enter a PIN.
Which security keys count as “modern”? For most people it'll mean a series-5 black Yubikey or else a blue Yubikey that has a faint “2” printed on the upper side. Of course, there are other manufacturers who make security keys and, if it advertises “CTAP2” support, there's a good chance that it'll work too. But those Yubikeys certainly do.
In practical terms, web sites exercise this capability via WebAuthn, the same API that handles the traditional security key flow. (I'm not going to go into much detail about how to use WebAuthn. Readers wanting more introductory information can see what I've written previously or else see one of the several tutorials that come up in a Google search.)
In WebAuthn terms, a “resident” credential is one that can be discovered without knowing its ID. Generally, most security keys operate statelessly, i.e. the credential ID is an encrypted private seed, and the security key doesn't store any per-credential information itself. Thus the credential ID is required for the security key to function so the server sends a list of them to the browser during login, implying that the server already knows which user is logging in. Resident keys, on the other hand, require some state to be kept by the security key because they can be used without presenting their ID first. (Note that, while resident keys require some state to be kept, security keys are free to keep state for non-resident keys too: resident vs non-resident is all about whether the credential ID is needed.)
User verification is about whether the security key is providing one or two authentication factors. With the traditional experience, the security key is something you have and the password is something you know. In order to get rid of the password, the security key now needs to provide two factors all by itself. It's still something you have so the second security key factor becomes a PIN (something you know) or a biometric (something you are). That begs the question: what's the difference between a PIN and a password? On the surface: nothing. A security key PIN is an arbitrary string, not limited to numbers. (I think it was probably considered too embarrassing to call it a password since FIDO's slogan is “solving the world's password problem”.) So you should think of it as a password, but it is a password with some deeper advantages: firstly, it doesn't get sent to web sites, so they can't leak it and people can safely use a single password everywhere. Secondly, brute-force resistance is enforced by the hardware of the security key, which will only allow eight attempts before locking and requiring a reset. Still, it'll be nice when biometrics are common in security keys.
A user ID is an opaque identifier that should not be personally identifying. Most systems will have some database primary key that identifies a user and, if using that as a WebAuthn user ID, ensure that you encrypt it first with a key that is only used for this purpose. That way it doesn't matter if those primary keys surface elsewhere too. Security keys will only store a single credential for a given pair of domain name and user ID. So, if you register a second credential with the same user ID on the same domain, it'll overwrite the first.
The fact that you can register more than one credential for a given domain means that it's important to set the metadata correctly when creating a resident credential. This isn't unique to resident keys, but it's much more important in this context. The user name and displayName will be shown by the browser during login when there's more than one credential for a domain. Also the relying party name and displayName will be shown in interfaces for managing the contents of a security key.
When logging in, WebAuthn works as normal except you leave the list of credential IDs empty and set userVerification to required. That triggers the resident-credential flow and the resulting credential will include the user ID, with which you look up the user and their set of registered public keys, and then validate the public key and other parameters.
Microsoft have a good test site (enter any username) where you can experiment with crafting different WebAuthn requests.
In order to support the above, security keys obviously have a command that says “what credentials do you have for domain x?”. But what level of authentication is needed to run that command is a little complex. While it doesn't matter for the web, one might want to use security keys to act as, for example, door access badges; especially over NFC. In that case one probably doesn't want to bother with a PIN etc. Thus the pertinent resident credentials would have to be discoverable and exercisable given only physical presence. But in a web context, perhaps you don't want your security key to indicate that it has a credential stored for cia.gov (or mss.gov.cn) to anyone who can plug it in. Current security keys, however, will disclose whether they have a resident credential for a given domain, and the user ID and public key for that credential, to anyone with physical access. (Which is one reason why user IDs should not be identifying.) Future security keys will have a concept of a per-credential protection level which will prevent them from being disclosed without user verification (i.e. PIN or biometrics), or without knowing their random credential ID. While Chromium will configure credential protection automatically if supported, other browsers may not. Thus it doesn't hurt to set credProtect: 2 in the extensions dictionary during registration.
Zero-knowledge attestation (01 Jan 2019)
U2F/FIDO tokens (a.k.a. “Security Keys”) are a solid contender for doing something about the effectiveness of phishing and so I believe they're pretty important. I've written a fairly lengthy introduction to them previously and, as mentioned there, one concerning aspect of their design is that they permit attestation: when registering a key it's possible for a site to learn a cryptographically authenticated make, model, and batch. As a browser vendor who has dealt with User-Agent sniffing, and as a large-site operator, who has dealt with certificate pervasiveness issues, that's quite concerning for public sites.
It's already the case that one significant financial site has enforced a single-vendor policy using attestation (i.e. you can only register a token made by that vendor). That does not feel very congruent with the web, where any implementation that follows the standards is supposed to be a first-class citizen. (Sure, we may have undermined that with staggering levels of complexity, but that doesn't discredit the worth of the goal itself.)
Even in cases where a site's intended policy is more reasonable (say, they want to permit all tokens with some baseline competence), there are strong grounds for suspecting that things won't turn out well. Firstly, the policies of any two sites may not completely align, leading to a crappy user experience where a user needs multiple tokens to cover all the sites that they use, and also has to remember which works where. Secondly, sites have historically not been so hot about staying up-to-date. New token vendors may find themselves excluded from the market because it's not feasible to get every site to update their attestation whitelists. That feels similar to past issues with User-Agent headers but the solution there was to spoof other browsers. Since attestation involves a cryptographic signature, that answer doesn't work here.
So the strong recommendation for public sites is not to request attestation and not to worry about it. The user, after all, has control of the browser once logged in, so it's not terribly clear what threats it would address.
However, if we assume that certain classes of sites probably are going to use attestation, then users have a collective interest in those sites enforcing the same, transparent standard, and in them keeping their attestation metadata current. But without any impetus towards those ends, that's not going to happen. Which begs the question: can browsers do something about that?
Ultimately, in such a world, sites only operate on a single bit of information about any registration: was this public-key generated in a certified device or not? The FIDO Alliance wants to run the certification process, so then the problem reduces down to providing that bit to the site. Maybe they would simply trust the browser to send it: the browser could keep a current copy of the attestation metadata and tell the site whether the device is certified or not. I don't present that as a straw-man: if the site's aim is just to ensure that the vast majority of users aren't using some backdoored token that came out of a box of breakfast cereal then it might work, and it's certainly simple for the site.
But that would be a short blog post, and I suspect that trusting the browser probably wouldn't fly in some cases.
So what we're looking for is something like a group signature scheme, but we can't change existing tokens. So we need to retrospectively impose a group signature on top of signers that are using vanilla P-256 ECDSA.
It is a surprising but true result in cryptography that it's possible to create a convincing proof of any statement in NP that discloses nothing except the truth of the statement. As an example of such a statement, we might consider “I know a valid signature of message x from one of the public keys in this set”. That's a pretty dense couple of sentences but rather than write an introduction to zero-knowledge proofs here, I'm going to refer you to Matthew Green's posts. He does a better job than I would.
I obviously didn't pick that example at random. If there was a well-known set of acceptable public keys (say, as approved by the FIDO Alliance) then a browser could produce a zero-knowledge proof that it knew a valid attestation signature from one of those keys, without disclosing anything else, notably without disclosing which public key was used. That could serve as an “attestation valid” bit, as hypothesised above, that doesn't require trusting the browser.
As a concrete instantiation of zero-knowledge proofs for this task, I'll be using Bulletproofs [BBBPWM17]. (See zkp.science for a good collection of many different ZK systems. Also, dalek-cryptography have excellent notes on Bulletproofs; Cathie Yun and Henry de Valence from that group were kind enough to help me with a question about Bulletproofs too.)
The computational model for Bulletproofs is an arithmetic circuit: an acyclic graph where public and secret inputs enter and each node either adds or multiplies all its inputs. Augmenting that are linear constraints on the nodes of the circuit. In the tool that I wrote for generating these circuits, this is represented as a series of equations where the only operations are multiplication, addition, and subtraction. Here are some primitives that hopefully convince you that non-trivial functions can be built from this:
- IsBit(x): x² - x = 0
- NOT(x): 1 - x
- AND(x, y): x × y
- OR(x, y): x + y - (x × y)
- XOR(x, y): x + y - 2(x × y)
Using single bit values in an arithmetic circuit certainly works, but it's inefficient. Getting past single-bit values, the arithmetic circuits in Bulletproofs don't work in ℤ (i.e. arbitrary-length integers), rather they work over a finite field. Bulletproofs are built on top of an elliptic curve and the finite field of the arithmetic circuit is the scalar field of that curve.
When dealing with elliptic curves (as used in cryptography) there are two finite fields in play: the x and y coordinates of the points on the curve are in the coordinate field of the curve. Multiples of the base point (B) then generate a prime number (n) of points in the group before cycling back to the base point. So xB + yB = (x + y mod n)B — i.e. you can reduce the multiple mod n before multiplying because it'll give the same result. Since n is prime, reduction mod n gives a field, the scalar field.
(I'm omitting powers of primes, cofactors, and some other complications in the above, but it'll serve.)
So Bulletproofs work in the scalar field of whatever elliptic curve they're implemented with, but we want to build P-256 ECDSA verification inside of a Bulletproof, and that involves lots of operations in P-256's coordinate field. So, ideally, the Bulletproofs need to work on a curve whose scalar field is equal to P-256's coordinate field. Usually when generating a curve, one picks the coordinate field to be computationally convenient, iterates other parameters until the curve meets standard security properties, and the scalar field is whatever it ends up as. However, after some quality time with “Constructing elliptic curves of prime order” (Broker & Stevenhagen) and Sage, we find that y² = x³ - 3x + B over GF(PP) where:
- B= 0x671f37e49d38ff3b66fac0bdbcc1c1d8b9f884cf77f0d0e90271026e6ef4b9a1
- PP= 0xffffffff000000010000000000000000aaa0c132719468089442c088a05f455d
… gives a curve with the correct number of points, and which seems plausibly secure based on the SafeCurves criteria. (A more exhaustive check would be needed before using it for real, but it'll do for a holiday exploration.) Given its relationship to P-256, I called it “PP-256” in the code.
Reviewing the ECDSA verification algorithm, the public keys and message hash are obviously public inputs. The r and s values that make up the signature cannot be both be public because then the verifier could just try each public key and find which one generated the signature. However, one of r and s can be public. From the generation algorithm, r is the x-coordinate of a random point and s is blinded by the inverse of the nonce. So on their own, neither r nor s disclose any information and so can just be given to the verifier—moving work outside of the expensive zero-knowledge proof. (I'm not worrying about tokens trying to use a covert channel here but, if you do worry about that, see True2F.)
If we disclose s to the verifier directly then what's left inside the zero-knowledge proof is 1) selecting the public key; 2) checking that the secret r is in range; 3) u₂ = r/s mod n; 4) scalar-multiplication of the public key by u₂; 5) adding in the (now) public multiple of the base point; and 6) showing that the x-coordinate of resulting point equals the original r, mod n.
The public-key is a 4-tooth comb, which is a precomputed form that speeds up scalar multiplications. It consists of 30 values. The main measure that we want to minimise in the arithmetic circuit is the number of multiplications where both inputs are secret. When selecting from t possible public keys the prover supplies a secret t-bit vector where only one of the bits is set. The proof shows that each value is, indeed, either zero or one using IsBit (from above, at a cost of one multiply per bit), and that exactly one bit is set by requiring that the sum of the values equals one. Each of the 30t public-key values is multiplied by one of the bits and summed to select exactly one key.
Rather than checking that the secret r is within [0, n-1], which would cost 512 multiplies, we just check that it's not equal to zero mod n. That's the important condition here since an out of range r is otherwise just an encoding error. Showing that a number is not zero mod n just involves showing that it's not equal to zero or n, as 2n is outside of the arithmetic circuit field. Proving a ≠ b is easy: the prover just provides an inverse for a - b (since zero doesn't have an inverse) and the proof shows that (a - b) × (a - b)⁻¹ = 1.
Calculating r/s mod n is the most complex part of the whole proof! Since the arithmetic circuit is working mod P-256's p, working mod n (which is the order of P-256—slightly less than p) is awkward. The prover gives bit-wise breakdown of r; the proof does the multiplication as three words of 86, 86, and 84 bits; the prover supplies the values for the carry-chain (since bit-shifts aren't a native operation in the arithmetic circuit); the prover then gives the result in the form a×n + b, where b is a 256-bit number; and the proof does another multiplication and carry-chain to check that the results are equal. All for a total cost of 2152 multiplication nodes!
After that, the elliptic curve operation itself is pretty easy. Using the formulae from “Complete addition formulas for prime order elliptic curves” (Renes, Costello, and Batina) it takes 5365 multiplication nodes to do a 4-tooth comb scalar-mult with a secret scalar and a secret point. Then a final 17 multiplication nodes add in the public base-point multiple, supply the inverse to convert to affine form, and check that the resulting x-coordinate matches the original r value. The circuit does not reduce the x-coordinate mod n in order to save work: for P-256, that means that around one in 2¹²⁸ signatures may be incorrectly rejected, but that's below the noise floor of arithmetic errors in CPUs. Perhaps if this were to be used in the real world, that would be worth doing correctly, but I go back to work tomorrow so I'm out of time.
In total, the full circuit contains 7534 multiplication nodes, 2154 secret inputs, and 17 236 constraints.
(Pratyush Mishra points out that affine formulae would be more efficient than projective since inversion is cheap in this model. Oops!)
My tool for generating the matrices that Bulletproofs operate on outputs 136KB of LZMA-compressed data for the circuit described above. In some contexts, that amount of binary size would be problematic, but it's not infeasible. There is also quite a lot of redundancy: the data includes instructions for propagating the secret inputs through the arithmetic circuit, but it also includes matrices from which that information could be derived.
The implementation is based on BoringSSL's generic-curve code. It doesn't even use Shamir's trick for multi-scalar multiplication of curve points, it doesn't use Montgomery form in a lot of places, and it doesn't use any of the optimisations described in the Bulletproofs paper. In short, the following timings are extremely pessimistic and should not be taken as any evidence about the efficiency of Bulletproofs. But, on a 4GHz Skylake, proving takes 18 seconds and verification takes 13 seconds. That's not really practical, but there is a lot of room for optimisation and for multiple cores to be used concurrently.
The proof is 70 450 bytes, dominated by the 2154 secret-input commitments. That's not very large by the standards of today's web pages. (And Dan Boneh points out that I should have used a vector commitment to the secret inputs, which would shrink the proof down to just a few hundred bytes.)
Intermediates and FIDO2
One important limitation of the above is that it only handles one level of signatures. U2F allows an intermediate certificate to be provided so that only less-frequently-updated roots need to be known a priori. With support for only a single level of signatures, manufacturers would have to publish their intermediates too. (But we already require that for the WebPKI.)
Another issue is that it doesn't work with the updated FIDO2 standard. While only a tiny fraction of Security Keys are FIDO2-based so far, that's likely to increase. With FIDO2, the model of the device is also included in the signed message, so the zero-knowledge proof would also have to show that a SHA-256 preimage has a certain structure. While Bulletproofs are quite efficient for implementing elliptic curves, a binary-based algorithm like SHA-256 is quite expensive: the Bulletproofs paper notes a SHA-256 circuit using 25 400 multiplications. There may be a good solution in combining different zero-knowledge systems based on “Efficient Zero-Knowledge Proof of Algebraic and Non-Algebraic Statements with Applications to Privacy Preserving Credentials” (Chase, Ganesh, Mohassel), but that'll have to be future work.
Happy new year.
CECPQ2 (12 Dec 2018)
CECPQ1 was the experiment in post-quantum confidentiality that my colleague, Matt Braithwaite, and I ran in 2016. It's about time for CECPQ2.
I've previously written about the experiments in Chrome which lead to the conclusion that structured lattices were likely the best area in which to look for a new key-exchange mechanism at the current time. Thanks to the NIST process we now have a great many candidates to choose from in that space. While this is obviously welcome, it also presents a problem: the fitness space of structured lattices looks quite flat so there's no obviously correct choice. Would you like keys to be products (RLWE) or quotients (NTRU; much slower key-gen, but subsequent operations are faster; older, more studied)? Do you want the ring to be NTT-friendly (fast multiplication, but more structure), or to have just a power-of-two modulus (easy reduction), or to have as little structure as possible? What noise profile and failure probability? Smart people can reasonably end up with different preferences.
This begs the question of why do CECPQ2 now at all? In some number of years NIST will eventually whittle down the field and write standards. Adrian Stanger of the NSA said at CRYPTO this year that the NSA is looking to publish post-quantum standards around 2024, based on NIST's output. (And even said that they would be pure-PQ algorithms, not combined with an elliptic-curve operation as a safeguard.) So if we wait five years things are likely to be a lot more settled.
Firstly, you might not be happy with the idea of waiting five years if you believe Michele Mosca's estimate of a one sixth chance of a large quantum computer in ten years. More practically, as we sail past the two year mark of trying to deploy TLS 1.3, another concern is that if we don't exercise this ability now we might find it extremely difficult to deploy any eventual design.
TLS 1.3 should have been straightforward to deploy because the TLS specs make accommodations for future changes. However, in practice, we had to run a series of large-scale experiments to measure what patterns of bytes would actually weave through all the bugs in the TLS ecosystem. TLS 1.3 now has several oddities in the wire-format that exist purely to confuse various network intermediaries into working. Even after that, we're still dealing with issues. Gallingly, because we delayed our server deployment in order to ease the client deployment, we're now having to work around bugs in TLS 1.3 client implementations that wouldn't have been able to get established had we quickly and fully enabled it.
The internet is very large and it's not getting any easier to steer. So it seems dangerous to assume that we can wait for a post-quantum standard and then deploy it. Any CECPQ2 is probably, in the long-term, destined to be replaced. But by starting the deployment now it can hopefully make that replacement viable by exercising things like larger TLS messages. Also, some practical experience might yield valuable lessons when it comes to choosing a standard. If the IETF had published the TLS 1.3 RFC before feedback from deployment, it would have been a divisive mess.
At the time of CECPQ1, the idea of running both a post-quantum and elliptic-curve primitive concurrently (to ensure that even if the post-quantum part was useless, at least the security wasn't worse than before) wasn't universally embraced. But we thought it important enough to highlight the idea in the name: combined elliptic-curve and post-quantum. It's a much more widely accepted idea now and still makes sense, and the best choice of elliptic-curve primitive hasn't changed, so CECPQ2 is still a combination with X25519.
As for the post-quantum part, it's based on the HRSS scheme of Hülsing, Rijneveld, Schanck, and Schwabe. This is an instantiation of NTRU, the patent for which has expired since we did CECPQ1. (The list of people who've had a hand in NTRU is quite long. See the Wikipedia page for a start.)
Last year Saito, Xagawa, and Yamakawa (SXY) published a derivative of HRSS with a tight, QROM-proof of CCA2 security from an assumption close to CPA security. It requires changes (that slow HRSS down a little), but HRSS+SXY is currently the basis of CECPQ2. Since HRSS+SXY no longer requires an XOF, SHAKE-128 has been replaced with SHA-256.
Having said that it's hard to choose in the structured lattice space, obviously HRSS is a choice and there were motivations behind it:
- CCA2-security is worthwhile, even though TLS can do without. CCA2 security roughly means that a private-key can be used multiple times. The step down is CPA security, where a private-key is only safe to use once. NewHope, used in CECPQ1, was only CPA secure and that worked for TLS since its confidentiality keys are ephemeral. But CPA vs CCA security is a subtle and dangerous distinction, and if we're going to invest in a post-quantum primitive, better it not be fragile.
- Avoiding decryption failures is attractive. Not because we're worried about unit tests failing (hardware errors set a noise floor for that anyway), but because the analysis of failure probabilities is complex. In the time since we picked HRSS, a paper has appeared chipping away at these failures. Eliminating them simplifies things.
- Schemes with a quotient-style key (like HRSS) will probably have faster encap/decap operations at the cost of much slower key-generation. Since there will be many uses outside TLS where keys can be reused, this is interesting as long as the key-generation speed is still reasonable for TLS.
- NTRU has a long history. In a space without a clear winner, that's a small positive.
CECPQ2 will be moving slowly: It depends on TLS 1.3 and, as mentioned, 1.3 is taking a while. The larger messages may take some time to deploy if we hit middlebox- or server-compatibility issues. Also the messages are currently too large to include in QUIC. But working though these problems now is a lot of the reason for doing CECPQ2—to ensure that post-quantum TLS remains feasible.
Lastly, I want to highlight that this only addresses confidentiality, not authenticity. Confidentiality is more pressing since it can be broken retrospectively, but it's also much easier to deal with in TLS because it's negotiated independently for every connection. Post-quantum authenticity will be entangled with the certificate and CA ecosystem and thus will be significantly more work.