Phones as security keys in Chrome (20 Oct 2021)
With Chrome 94, if you have an Android phone with Chrome on it, and it’s syncing to the same Google account as Chrome on a Chrome OS/Windows/macOS device, then you’ll be able to use that phone as a security key. You should be able to try this out on any WebAuthn using website, for example here. (But not accounts.google.com, which uses a different system.)
The reason that you are reading about this here and not on an official Google site is that people shouldn’t start registering their phone as a security key unless they have a physical security key as a back up. Just like a regular security key, if you lose the phone then you lose the credentials. So, just like a regular security key, you should have a back up. (You can also lose your credentials if you remove the screen lock, or somehow wipeout Play Services state — e.g. by doing a factory reset.)
We have plans for addressing this and making this suitable for regular people, and to allow use in other profiles, but we’re not there yet. We are interested in whether the communications infrastructure is good enough though. (More below.)
For signing into Google, it has long been possible to use your phone as a security key. This only worked in Chrome and functioned over BLE GATT between the desktop and phone. We have wanted to expand this to the web in general for years, but the success rate that we measured with BLE was poor. After quite a lot of work trying to improve the BLE success rate, we didn’t achieve much.
But the use of BLE is more than just a convenience. The security model demands some proof of physical proximity between the authenticator and the machine that is being authenticated. For a USB security key the authenticator will only respond to something that is making physical contact with it. So when a phone is acting as a security key it needs to prove that the machine it is talking to is physically close by. (Or at least that the attacker is in control of a BLE radio that is physically close.)
We looked at other Bluetooth modes in the hopes that they might work better, but classic Bluetooth RFCOMM isn’t supported on iOS and requires a lot of user interaction on android. BLE L2CAP is supported on iOS, but isn’t supported (in user space) on Windows. It’s also flaky in the face of MAC address rotation if the devices aren’t paired.
So where we’ve ended up is that all the communication happens over the internet connection, but the phone sends a nonce in a BLE advert and the other end of the channel has to prove receipt. That’s the least amount of Bluetooth we could use while still requiring physical proximity. Needing bilateral internet connectivity is unfortunate though. So you can also connect the phone with a USB cable while the security key operation is running. (But very sadly not on Windows; The USB stack there just isn’t designed in the right way for it.) We might also add L2CAP as an option in the future.
This isn’t enabled on Linux at the moment. Historically trying to do the BLE GATT connection would often fail with bluez, and so the phone as a security key infrastructure was disabled on Linux. Now that the desktop only needs to receive a BLE advert it looks like it could work, but we haven’t flipped that switch yet.
As I mentioned above, we are interested in whether the underlying infrastructure is plausible. Aggregated anonymous statistics are useful for many things but in this case they suggest that BLE isn’t always working as well as it should, but don’t tell us why not. So if you are especially keen about security keys and want to try this out, I’d be interested in your experiences. I can't promise to respond but I will read everything if you send me an email (agl at chromium dot org) or tweet at me (agl__).
Some troubleshooting hints if you're having issues because this will be much faster than asking me!
If no phones appear as options: you're using Windows, macOS, or Chrome OS, yes? And it's an Android phone? And the machine has working Bluetooth? And Chrome is up to date everywhere? If you navigate to chrome://sync-internals, is the “Transport state” at the top-left reporting “Active”? If not, enable Sync. On the phone, does Settings say that Sync is on at the top? If not, enable it. Is the account listed in Settings on the phone the same as the “Username” in chrome://sync-internals on the desktop? If all that's good then probably you just need to wait because it can take a couple of days for the registration to propagate. Probably in the “Device Info” section of the “Sync Node Browser” in chrome://sync-internals your phone is listed, but there's no paask_fields section yet. If you want to short-circuit that, you can install Chrome Canary on the phone and enable syncing there. That should register quite quickly.
You can select the phone on the desktop, but nothing happens: the phone should be getting a cloud message poke that triggers a notification. Does the phone have internet access? Did you completely disable notifications from Chrome? `adb logcat | grep -i cable` would be interesting if you're setup for that. Otherwise, if this is a common issue, I might have to add some logging and ask for the tunnel URL from chrome://device-log.
You get the notification and tap it, but something else goes wrong: if an error code or error message is displayed then I want to know what it is! If it's hanging then the message at the bottom of the screen changes for each step of the process so that's what'll be useful. Most problems seem to be BLE: is the phone close to the desktop? What other BLE is happening on the devices?
Efficient QR codes (26 Aug 2021)
QR codes seem to have won the battle for 2D barcodes, but they're not just a bag of bits inside. Their payload is a series of segments, each of which can have a different encoding. Segments are bitstrings, concatenated without any byte-alignment, and terminated with an empty segment of type zero. If you want to squeeze the maximum amount of data into a QR code without it turning into a gray square, understanding segmentation helps.
The most basic segment type is byte mode, which is a series of 8-bit bytes. If you control the QR decoder then this is perfectly efficient for encoding binary data. But you probably need to work with a variety of decoders. In that case, beware: the first edition of the QR standard, ISO/IEC 18004, said that byte mode should be interpreted as JIS X 0201. The 2005 edition changed that to be ISO/IEC 8859-1 (i.e. Latin-1). In practice, some QR decoders will attempt to content-sniff the encoding because, while UTF-8 contents should be indicated with an ECI header, they often aren't and UTF-8 is really common.
So if you put binary data into a QR code, you are probably going to hit these edge cases. The contents are likely going to be passed to a general operating system API for handling URLs — do you think the full pipeline will handle NUL bytes in the string, and UTF-8 non-characters and invalid surrogate pairs when interpreted as UTF-8? Also, you probably want your QR contents to be a printable string: bits of it might be shown in the scanner's UI; users might need to copy and paste them.
So let's assume that you want something printable and consider an obvious answer: base64url. It's very common, printable, and doesn't contain any characters that are special in URLs for maximum compatibility. It'll be encoded in a byte-mode segment: each base64url character contains 6 bits of information and takes 8 bits in the QR code for an efficiency of 75%. That's our baseline.
The next segment type to consider is digit mode. This only encodes the digits, 0–9, by packing triples of digits into 10 bits. If there are two digits left over at the end then they take 7 bits, and a singleton takes 4 bits. Ignoring the potential digits at the end, this lets you store 3×log2(10) = 3×3.322 = 9.966 bits of information in 10 bits of space. That's 99.66% efficient! So you can clearly do better than base64url.
The last segment type for these purposes is alphanumeric mode. These segments can encode A–Z, 0–9, and nine special characters: $, %, *, +, -, ., /, :, and space. Pairs of these characters are encoded in 11 bits. (If there's an odd number then the last takes 6 bits.) If you consider this to be “base-45” encoding then it stores 2×log2(45) = 10.98 bits in 11 bits of space, for 99.85% efficiency. Even better than digit mode, although only just.
So maybe you should base-45 encode your data using that alphabet. But, of the special characters supported in alphanumeric mode, only two (minus and period) are unreserved (i.e. safe) in URLs. So you might be reduced to base-38, which cuts the efficiency to 95.42%. But having textually smaller QR contents might be valuable and worth a few percent efficiency in your context.
If you've picked base-10 (digits), base-38, or even base-45 for your data then you need to get it into that form. Base-64 is easy because that's exactly 6 bits per character; you work on 3 bytes of input at a time and produce exactly 4 characters of output. But 10, 38, and 45 aren't powers of two. You've got three options here. The obvious conversion would be to treat the input as a bigint and repeatedly divmod by 10 (or 38, etc) to generate the encoding. If you have a bigint library to hand then it almost certainly has the functions for that, but it's a rather obnoxious (and quadratic) amount of computation and a significant dependency. So you might be willing to waste a few bits to make things easier.
Next option is an encoding noted by djb that achieves similar efficiency but with less computation and no long-division. I updated this post to include it, so it's covered in a section below.
The third option is to chunk the input and convert each chunk independently. Ideal input chunks would be 8 bytes or fewer, because nearly all environments will support a uint64 type and nearly all hardware can do a divmod on them. If you're using base-10 then there's going to be a function that can “print” a uint64 to digits for you, so let's take digits as an example. With a chunk size of two bytes you would get five digits. Each digit takes 3⅓ bits of space, so 16 input bits takes 16⅔ bits: 96% efficient. Less than the 99.66% we can get with digits for sure. But if you consider all chunk sizes from one to eight bytes, turning 7-byte chunks into 17 digits is 98.82% efficient. That's pretty good for the complexity savings.
For base-38, 7-byte chunks are the best again, producing 11 characters for 92.56% efficiency. For base-45, two-byte chunks produce 3 characters for 96.97% efficiency. (Four- and eight-byte chunks do the same if you want fewer loop iterations.)
(And you should use little-endian chunks because little-endian has won, even if the IETF hasn't caught up to that yet.)
Now you've got your payload encoding sorted … probably. A wrinkle is that it's difficult to know how your QR encoder will segment what you give it. You might have crafted a wonderful base-38 input and it might stuff it all into a byte-mode segment! (68.65% efficient, worse than base64url.) I'm sadly not aware of a good QR “debugger” that shows all the details of a QR code. ZXing's online service will give a hex-dump of the raw contents, but that assumes that you can decode the segment headers. (QR-Logo promises better debugging output but doesn't work for me.) My best advice is to use ZXing on a sample QR code, ignore the 0xec, 0x11 padding pattern at the end, and calculate whether the number of bytes used roughly makes sense.
You probably want to put a URL-like prefix at the front to make your QR codes distinguishable. One thing to consider is that “https://www.example.com/” is a byte-mode segment that takes 204 bits, but “HTTPS://WWW.EXAMPLE.COM/” is alphanumeric and needs only 145 bits. (That's assuming QR versions 1 through 9, the lengths are slightly different otherwise.) DNS names are case insensitive and “an implementation should accept uppercase letters” for the scheme says RFC 3986. Maybe it just looks odd and that's not worth the bits, though?
We'll finish up with a quick look at an example, which is the thing that started me on this path in the first place: SMART Health Cards. (Thank you to the person who pointed me at them, who likely wants to remain anonymous.)
SHC's are trying to squeeze a lot of data into a QR code: they minify their JSON structure and compress it but, even then, they sometimes span multiple QR codes and the user has to scan them all. As such their contents are a) a binary segment containing “shc:/” (and maybe sequence numbers if using multiple QR codes), and then b) a digits segment containing the payload. So they didn't use “SHC:/” to save bits, but the difference is small.
One thing to note is that the QR spec (ISO/IEC 18004:2005) has a whole section on “structured append” mode, where multiple QR codes can be combined into one. But trying that with iOS and Android shows that neither support it, so probably it can be considered dead and that's why SHC is replicating the same feature at a higher level.
Another thing to note is that SHC is using digits for better efficiency, which is great, but the way that they do it is not. They're using JWT, which is bad but not today's topic, so they have three base64-encoded strings. They then take each base64 character, subtract 45, and turn that into two base-10 digits! All that work minifying JSON and compressing it, and then they throw away 10% of their bits on such a trivial thing!
So SHC did pretty well, but missed an easy win. Having read this, you can do better.
The NTRU Prime encoding
Above, I referenced an encoding that gets nearly all the space efficiency of the bigint-and-divmod method, but without the computational costs. This section is about that. It's taken from page 18 of the NTRU Prime NIST submission.
Our motivating issue is thus: if you have a whole byte then taking it mod 10 to generate a digit works fairly well. The digits 0–5 have probability 26/256 while 6–9 have probability 25/256. That's not uniform therefore it doesn't encode the maximum amount of entropy, but it's 99.992% efficient, which is pretty good. But when you have a smaller range of input values the non-uniformity becomes significant and so does the reduction in information transmitted.
The encoding in NTRU Prime takes input values (which can be larger than bytes) and combines pairs of them. It produces some output values from each pair but, once the non-uniformity gets unconfortable, it pairs up the leftovers to increase the range. This repeats in a binary-tree.
As a concrete example we'll use the Python code from page 18 and set M = [256…] (because our inputs are bytes), change the 256 and 255 to 10 and 9 (to extract digits, not bytes), and set limit to 1024. Our example input will be 419dd0ed371f44b7.
The input bytes are written (in base 10) along the top. Pairs of them are combined. Take the top-right box: its value is 157×256 + 65 = 40257. That can be considered to be 40257 mod 65536 and, since there's a reasonable number of bits in there, two digits are extracted. Obviously 40257 mod 10 = 7, and 4025 mod 10 = 5. So the two digits are 7 and 5. That leaves 402 mod 656, and 656 is below the limit of 1024 that we set, so it's passed down to be combined into the box below. This continues down the tree: each time there's too little data left to extract another digit, the leftovers are passed down to be combinined. At the bottom there's nothing else to combine with so the final leftover value, 132 mod 186, is converted into the digits 2, 3, and 1. The ultimate output digits are concatenated from left-to-right, top-to-bottom.
This achieves good encoding efficiency without repeated long-division, and can be parallelised.
ACVP (23 Dec 2020)
If you do not know what ACVP is then you should read no further. If you think it might be useful then what you're actually looking for is Wycheproof; ACVP is only for those who have no choice.
If you're still reading and you're vaguely aware that your previous CAVP infrastructure isn't applicable any longer, and that you'll need to deal with ACVP next time, then you might be interested in BoringSSL's ACVP infrastructure. We have a number of different FIPS modules to test and wanted something generic rather than repeating the bespoke-per-module way that we handled CAVP. We also need to test not just BoringCrypto (a software module) but also embedded devices.
The result, acvptool, lives within the BoringSSL repo and can translate ACVP JSON into a series of reasonably simple IPC calls that a “module wrapper” speaks over stdin/stdout. BoringSSL's module wrapper is the reference implementation, but there's also a tiny one for testing that could easily be repurposed to forward over a serial link, etc, for embedded devices.
It's reasonably likely that you'll find some case that's not handled, but the code is just Go throwing around JSON so you should be able to extend it without too much bother. But, for the cases that are already handled, the weird undocumented quirks that'll otherwise consume hours of your life are taken care of.
Letter to 20 years ago (06 Sep 2020)
I noticed that I have not posted anything here in 2020! There's a bunch of reasons for this: the work I'm doing at the moment does not lend itself so well to blog posts, and life intervenes, leaving less time for personal projects. But in order to head off the risk that I'll post nothing at all this year I pulled something from one of my notebooks. 2020 is a round number so I decided to do some reflection and this was a letter that I imagined writing to myself 20 years ago. It is very much a letter to me! The topics are going to be quite specific and if you weren't paying attention to the computing industry in the year 2000 I’m not sure how much of it will make sense. But these are the points that I think me of 20 years ago would have wondered about.
You must be thinking that computers will be crazy fast by now. Yes…ish. It's complicated, and that's going to be a theme here. You've been hearing from Intel that the NetBurst chips will hit 10GHz in a few years, so with another few doublings what will have by now? 50GHz? Actually common values are around 3.5GHz. Some reach 5GHz, but only in bursts. Intel never hit 10GHz and nor did anybody else. It’s better than it sounds: instructions per clock are up a lot, so each cycle is worth more. (Although maybe we'll get to some of the issues that caused!) More importantly, all systems are multiprocessor now. It's physically a single chip, but inside is often 8- to 32-way SMT. Yep, that's cool. And yep, it only helps for certain sorts of workloads. Multithreaded programming is not going away.
Memory? 10s of gigabytes is common. Hard drives? It's nearly all flash now. You can still buy hard drives and they’re huge and cheap, but the speed of flash is pretty sweet. Computers really are quite substantially faster — don't be too put off by the clock speeds.
Your day-to-day life is a bunch of xterms and a web browser. Nothing's changed; you are dramatically underestimating the importance of path dependence. Terminals are still emulating a fancy VT-100 and sometimes they get messed up and need a reset. No fixes there. It's still bash or zsh; nearly unchanged from your time. The kernel has been fixed a little: you can now get a handle to a process, so no more PID races. You can open a file relative to a directory descriptor and you can create an unlinked file in a directory and link it later. Yes it's good that these things are possible now, but it is not a fundamental change and it took a long time. Actually you know what? Windows grew a much smarter shell, leapfrogging Linux in several respects. They had hardly moved forward since DOS so it was easier there, perversely.
So innovation must have happened higher up where there was new ground and little legacy, right? What about the semantic web? How did that turn out? Not well. We don't have lots of data in machine-readable formats and fancy GUIs so that anyone can create automation. Information is somewhere between impossible and a huge pain to access. You’ve read The Dilbert Future by now and its ‘confusopoly’ concept is much closer to the mark. The Semantic Web stuff failed so badly that nobody even tries any longer. (I'm afraid Scott Adams won’t seem so wholesome in the future either.) The closest you'll get is that your web browser can fill out your name, address, and credit card details. And it has to work really hard to do that because there’s almost no assistance from web pages. Go find Weaving the Web and throw it away.
Something more positive: bandwidth! You are using a dial-up that tops out at 5 KB/s and charges by the minute. You use a local proxy that keeps a copy of everything so that viewed pages are available offline and it lets you mark missing pages for batch fetching to reduce the cost. This problem is now solved. You can assume that any house in a city can get an always-on, several 10s of Mb/s connection. It's not as cheap as it could be but it's a standard household expense now. (Note: past me doesn't live in the US! —agl.) Also, everyone carries an impossibly fancy PDA that has that level of connection wirelessly and everywhere. I don't need to equivocate here, connectivity is solved in the sorts of places you're likely to live. But … there's a second edge to that sword. This connectivity can be a bit … much? There are some advantages to having the internet be stationary, metered, and behind 30 seconds of banshee wailing and static. Imagine your whole social life getting run through IRC, and that you're always connected. It’s tough to explain but there's a problem. But these PDAs? They have GPS and maps. Nobody gets lost anymore. Nobody carries paper street maps in their car. Connectivity can be pretty sweet.
This next bit is going to upset you a little: the whole Palladium / trusted boot stuff never took off on the desktop, but these PDAs are pretty locked down. One type of them is completely locked down and you can’t run non-approved software. The other will make you jump through hoops and, even then, you can't access the data of other programs. On the latter sort you can install a completely custom OS most of the time, but there's attestation and some things won't cooperate. This is still playing out and people are fighting over the details (because of money, of course). It remains a concern, but you underestimate the benefits of this sort of system. Your idea that people should own their own computers because they’re critical tools isn't wrong, but it is elitist. For the vast majority of people, their desktops degrade into a fragile truce with a whole ecosystem of malware and near-malware. Maybe it's their “fault” for having installed it, but these PDAs are so popular, in part, because they're hard to screw up. Bad stuff does get through the approval process, but it cannot mess things up to the wipe-and-reinstall level that desktops reach. The jury is still out about whether we will regret this, but you're wrong about the viability of giving people Windows XP and getting a good result.
Back on a positive note: the music industry switched to a $15 a month stream-whatever-you-want model and it works fine. You were completely right about this. Music still exists and it still pays a few at the top large sums and the rest very little. The music industry itself didn't sort this out though, other companies did it for them. What you're missing is that you’re not taking things far enough: companies also did this for TV and (many) movies. There are still rips of this stuff on BitTorrent, but it's not a live issue because people pay the subscription for the ease, by and large. In fact, access to scientific papers is a hotter issue now!
Basically, rates of change are really uneven.
This is the third in a series of posts about running experiments on post-quantum confidentiality in TLS. The first detailed experiments that measured the estimated network overhead of three families of post-quantum key exchanges. The second detailed the choices behind a specific structured-lattice scheme. This one gives details of a full, end-to-end measurement of that scheme and a supersingular isogeny scheme, SIKE/p434. This was done in collaboration with Cloudflare, who integrated Microsoft's SIKE code into BoringSSL for the tests, and ran the server-side of the experiment.
Google Chrome installs, on Dev and Canary channels, and on all platforms except iOS, were randomly assigned to one of three groups: control (30%), CECPQ2 (30%), or CECPQ2b (30%). (A random ten percent of installs did not take part in the experiment so the numbers only add up to 90.) CECPQ2 is the hybrid X25519+structured-lattice scheme previously described. CECPQ2b is the name that we gave to the combination of X25519 and the SIKE/p434 scheme.
Because optimised assembly implementations are labour-intensive to write, they were only available/written for AArch64 and x86-64. Because SIKE is computationally expensive, it wasn’t feasible to enable it without an assembly implementation, thus only AArch64 and x86-64 clients were included in the experiment and ARMv7 and x86 clients did not contribute to the results even if they were assigned to one of the experiment groups.
Cloudflare servers were updated to include support for both CECPQ2 and CECPQ2b, and to support an empty TLS extension that indicated that they were part of the experiment. Depending on the experiment group, Chrome would either offer CECPQ2, CECPQ2b, or just non-post-quantum options, in its TLS 1.3 handshake, along with the signaling extension to indicate which clients were part of the control group. Measurements were taken of how long TLS handshakes took to complete using Chrome’s metrics system. Chrome knew which servers were part of the experiment because they echoed the signaling extension, thus all three groups were measuring handshake duration against the same set of servers.
After this phase of the trial was complete, client-side measurements were disabled and Chrome Canary was switched to a mode where it randomly picked one of CECPQ2, CECPQ2b, or neither to offer. This enabled some additional, server-side measurements to ensure that nothing unexpected was occuring.
(Cloudflare has a significantly more detailed write up of this experiment.)
We’re aware of a couple of biases and these need to be kept in mind when looking at the results. Firstly, since ARMv7 and x86 platforms were excluded, the population was significantly biased towards more powerful CPUs. This will make supersingular isogenies look better. Also, we’ve seen from past experiments that Canary and Dev Chrome users tend to have worse networks than the Chrome user population as a whole, and this too will tend to advantage supersingular isogenies since they require less network traffic.
Here are histograms of client-side results, first from Windows (representing desktops/laptops) and from Android (representing mobile devices):
From the histograms we can see that the CECPQ2b (SIKE) group shifts visibly to the right (i.e. slower) in both cases. (On Android, a similar but smaller shift is seen for CECPQ2.) Despite the advantages of removing the slower clients and experimenting with worse-than-usual networks, the computational demands of SIKE out-weigh the reduced network traffic. Only for the slowest 5% of connections are the smaller messages of SIKE a net advantage.
Cloudflare have a much more detailed analysis of the server-side results, which are very similar.
While there may be cases where the smaller messages of SIKE are a decisive advantage, that doesn’t appear to be the case for TLS, where the computational advantages of structured lattices make them a more attractive choice for post-quantum confidentiality.