ImperialViolet

Adam Langley's Weblog

Passkeys (04 Jul 2022)

The presentations are out now (Google I/O, WWDC): we're making a push to take WebAuthn to the masses.

WebAuthn has been working reasonably well for enterprises and technically adept users. But we were not going to see broad adoption while the model was that you had to purchase a pair of security keys, be sure to register the backup on every site, yet also keep it in a fire safe. So the model for consumers replaces security keys with phones, and swaps out having a backup authenticator with backing up the private keys themselves. This could have been a different API, but it would have been a lot to have a second web API for public-key authentication, so WebAuthn it is. Basing things on WebAuthn also means that you can still use your security key if you like*, and hopefully with a expanded ranges of sites.

(* albeit not with Android this year, because it doesn't support a recent enough version of CTAP yet. Sorry.)

The WebAuthn spec is not a gentle introduction, but you can find several guides on how to make the API calls. Probably there'll be more coming. What I wanted to cover in this post is the groundwork semantics around passkeys. These are not new if you're fully versed in WebAuthn, but they are new if you've only used WebAuthn in a 2nd-factor context.

I'll probably build on this in some future posts and maybe amalgamate some past writings into a single document. The next paragraph just drops you into things without a lot of context. Perhaps it'll be useful for someone, but best to understand it as fragments of a better document that I'm accumulating.

Ok:

An authenticator is a map from (RP ID, user ID) pairs, to public key credentials. I'll expand on each of those terms:

An authenticator, traditionally, is the physical thing that holds keys and signs stuff. Security keys are authenticators. Laptops can be too; Windows Hello has been an authenticator for a while now. In the world of passkeys, phones are important authenticators. Now that keys may sync, you might consider the sync account itself to be a distributed authenticator. So, rather than thinking of authenticators as physical things, think of it as whatever maintains this map that contains the user's credentials.

An RP ID identifies a website. It's a string that contains a domain name. (In non-web cases, like SSH, it can be a URL, but I'm not covering them here.) A website can use an RP ID if that RP ID is equal to, or a suffix of, the site's domain, and the RP ID is at least an eTLD + 1. So https://foo.bar.example.com can use RP IDs foo.bar.example.com, bar.example.com, and example.com, but not com (because that's less than an eTLD + 1), nor example.org (because that's an unrelated domain). Because the credential map is keyed by RP ID, one website can't use another's credentials. However, be conscious of subdomains: usercontent.example.com can use an RP ID of example.com, although the client data will let the server know which origin made any given request.

Next, a user ID is an opaque byte string that identifies an account on a website. The spec says that it mustn't contain identifiable information (i.e. don't just make it the user's email address) because security keys don't protect the user ID to the same degree that they protect other information. The recommendation is to add a column to your users table, generate a large random value on demand, and store it there for each user. (The spec says 64 bytes but if you just generated 16 from a secure random source, I think you would be fine.) You could also HMAC an internal user ID, although that concentrates risk in that HMAC key.

Recall that an authenticator maps (RP ID, user ID) pairs to credentials? The important consequence is that if a site creates a credential it'll overwrite any existing credential with the same user ID. So an authenticator only contains a single credential for a given account. (For those who know WebAuthn already: I'm assuming discoverable credentials throughout.)

A credential is a collection of various fields. Most obviously a private key, but also metadata: the RP ID, for one, and user information. There's three pieces of user information: the user name, display name, and ID. We've covered the user ID already. The other two are free-form strings that get displayed in UI to help a user select a credential. The user name is generally how a user identifies themselves to the site, e.g. an email address. The display name is how the user would want to be addressed, which could be their legal name. Of the two, the user name will likely be more prominent in UIs.

Lastly a passkey is a WebAuthn credential that is safe and available when the user needs it, i.e. backed up. Not all implementations will be backing up credentials right away and passkeys on those platforms can be called “single-device passkeys”, which is a little awkward, but so's the lack of backup.

Another important aspect of the structure of things is that, while an account only has a single password, it can have multiple passkeys. That's because passkeys can't be copy–pasted around like passwords can. Instead users will register a passkeys as needed to cover their set of devices.

The authenticatorAttachment field in the assertion structure hints to the website about when an additional passkey might be needed. If the value of that field is cross-platform then the user had to use another device to sign-in and it might be worth asking them if they want to register the local device.

When there are multiple passkeys registered on a site, users will need to manage them. The way that sites have often ended up managing 2nd-factor WebAuthn credentials is via an explicit list in the user's account settings. Usually the user will be prompted to name a credential at registration time to distinguish them. That's still a reasonable way to manage passkeys if you like. (We pondered whether browsers could send a passkey “name” at registration time to avoid prompting the user for one but, in a world of syncing, there doesn't seem to be a good value that isn't either superfluously generic, e.g. “Google devices”, or a privacy problem, e.g. the sync account identifier).

If prompting for names and having an explicit list seems too complex then I think it would also be fine to simply have a reset button that a) registers a new passkey, b) deletes every other passkey on the account, and c) signs out all other sessions. I.e. a button that mirrors a password reset flow. For a great many sites, that would work well.

We don't want passwords to hang around on an account forever as a neglected weak-link. In order to guide sites in determining when that might be worth prompting the user to remove their password, there's a new backup state bit in the authenticator data that the server gets with each authentication. If it's set then the passkey will survive the loss of the device. (Unless the device is destroyed after creating the passkey but before managing to sync. But that's the same as generating a random password in a password manager.)

There are not firm rules around using that bit, but once the user has a backed up passkey on a portable authenticator then it's probably time to ask about dropping the password. Sites can know this when they see a creation or assertion operation with the backup state bit set and either the attachment is cross-platform, or it's platform but that platform is a mobile device. That's a conservative set of rules because, for example, a credential created on a MacBook might get synced to the user's iPhone. But maybe that user doesn't have an iPhone.

As you can see, one of the challenges with passkeys is the complexity! Several teams are Google are still working on fleshing out the things demoed at I/O but we know that good guidance will be important. In the mean time, I'm happy to answer questions over Twitter and am pondering if public office hours would be helpful too.

The several canons of CBOR (17 Apr 2022)

There are many encoding formats. CBOR is one of them. Like several others, a subset of it basically fine—I'm not starting that fight today.

Whatever encoding you use, it's nice to reduce flexibility. If there are multiple ways of encoding the same thing then, for anything with a non-negligible diversity of implementations, you'll find that there is a canonical encoding, it's just not documented. As soon as one implementation tries using a valid, but less common encoding, it'll find that it doesn't work and something will change to sort it all out. But that process of eventual coordination is expensive—better to do it explicitly and up-front.

For example, TCP options are chunks of tag–length–value and the spec doesn't define any restriction on their ordering. That didn't stop me blowing up a Ubuntu release because a change of mine happened to alter the order that Linux sent them, and some DSL modems dropped packets when options came in that order. That's an expensive discovery of an implicit canonicalisation rule.

If you're using CBOR then RFC 7049 has a section titled “Canonical CBOR”. One of the things it defines is how to order map elements. Great! No more TCP options messes. You can read that section for yourself, but one interpretation of the words there is what I'll call the three-step ordering for map keys:

  1. Lowest valued types come first then, within each type,
  2. Shortest encoded key comes first then, within consecutive keys with equal lengths,
  3. Sort lexicographically.

However, there is another interpretation of the same section which I'll call the two-step ordering. It drops the first step, i.e. doesn't sort by types. When I first read that section, I certainly came away with only one interpretation, but I can somewhat see how the other arises.

CTAP, the protocol for talking to security keys, uses CBOR a lot and explicitly picks the three-step ordering. An errata was eventually raised against the RFC to “clarify” that the three-step order was correct, but it was rejected as a semantic change. So perhaps that's an official ruling that two-step was the correct understanding?

It only matters if you mix different types of keys in the same map, and sometimes you don't, so maybe it's moot for you. (But keep in mind that negative and non-negative numbers are different types in CBOR.) Anyway, the IETF revised the CBOR RFC to firmly clarify the issue add a third option:

RFC 7049 is obsoleted by 8949. The old “Canonical CBOR” section is replaced by “Core Deterministic Encoding Requirements” and that specifies what I'll have to call the one-step ordering:

  1. Sort lexicographically on the encoded key.

It has some advantages! Like, why was it ever more complicated than that in the first place? But it was, and now there's two (or three) orderings. The two-step ordering even has a subsection of its own in the new RFC and a note saying that it “could be called Old Canonical CBOR”. If only shipped implementations were as changeable as the X-Men universe.

So that's why there's two and half “canonical” CBORs and now I have something that I can reference when people ask me about this.

Update: Someone on Twitter (Thomas Duboucher, perhaps? Sorry, I lost the tweet!) pointed out that the 1- and 3-step ordering give the same results in many cases. Having thought about it, that's correct! As long as you don't use maps, arrays, or tagged values as map keys, I believe they coincide. For something like CTAP2, where map keys are always ints or strings, they work out the same. So perhaps things aren't so bad. (Or perhaps a really subtle difference is worse! Time will tell.)

Picking parameters (15 Mar 2022)

When taking something from cryptographic theory into practice, it's very important to pick parameters. I don't mean picking the right parameters — although that certainly helps. I mean picking parameters at all.

That might seem obvious, but there are pressures pushing towards abdication: what if you get it wrong? Why not hedge bets and add another option? What about the two groups who already shipped something? We could just add options so that everyone's happy.

There are always exceptions, but the costs of being flexible are considerable. ANSI X9.62, from 1998, specified how elliptic curves were encoded in ASN.1 and included not just 27 possible curves, but allowed the parameters to be inherited from the issuing CA and allowed completely arbitrary elliptic curves to be defined inside a public key. That specifies almost nothing and avoids really standardising on anything. Thankfully I've not seen parameter inheritance implemented for ECC (it would be a security mess) but support for custom elliptic curves does, sadly, exist.

A couple of years ago, Microsoft had a signature bypass because of support for arbitrary curves [details]. Today, OpenSSL had an infinite loop in certificate parsing because of the same. On the flip side, I'm not aware that anyone has ever used the ability to specify arbitrary curves for anything useful.

It's not fair to just blame X9.62: don't get me started about RSA PSS. The cost of having too many options is frequently underestimated. Why does AES-192 exist, for example, other than to inflate test matrices?

As an aside, it's interesting to wonder how many CPU decades have been spent fuzzing OpenSSL's certificate parsing and yet today's issue wasn't reported before. This isn't a 2-128 fraction of state space that fuzzers couldn't be expected to find, and I'm not sure that Tavis found it from fuzzing at all.

Phones as security keys in Chrome (20 Oct 2021)

With Chrome 94, if you have an Android phone with Chrome on it, and it’s syncing to the same Google account as Chrome on a Chrome OS/Windows/macOS device, then you’ll be able to use that phone as a security key. You should be able to try this out on any WebAuthn using website, for example here. (But not accounts.google.com, which uses a different system.)

The reason that you are reading about this here and not on an official Google site is that people shouldn’t start registering their phone as a security key unless they have a physical security key as a back up. Just like a regular security key, if you lose the phone then you lose the credentials. So, just like a regular security key, you should have a back up. (You can also lose your credentials if you remove the screen lock, or somehow wipeout Play Services state — e.g. by doing a factory reset.)

We have plans for addressing this and making this suitable for regular people, and to allow use in other profiles, but we’re not there yet. We are interested in whether the communications infrastructure is good enough though. (More below.)

For signing into Google, it has long been possible to use your phone as a security key. This only worked in Chrome and functioned over BLE GATT between the desktop and phone. We have wanted to expand this to the web in general for years, but the success rate that we measured with BLE was poor. After quite a lot of work trying to improve the BLE success rate, we didn’t achieve much.

But the use of BLE is more than just a convenience. The security model demands some proof of physical proximity between the authenticator and the machine that is being authenticated. For a USB security key the authenticator will only respond to something that is making physical contact with it. So when a phone is acting as a security key it needs to prove that the machine it is talking to is physically close by. (Or at least that the attacker is in control of a BLE radio that is physically close.)

We looked at other Bluetooth modes in the hopes that they might work better, but classic Bluetooth RFCOMM isn’t supported on iOS and requires a lot of user interaction on android. BLE L2CAP is supported on iOS, but isn’t supported (in user space) on Windows. It’s also flaky in the face of MAC address rotation if the devices aren’t paired.

So where we’ve ended up is that all the communication happens over the internet connection, but the phone sends a nonce in a BLE advert and the other end of the channel has to prove receipt. That’s the least amount of Bluetooth we could use while still requiring physical proximity. Needing bilateral internet connectivity is unfortunate though. So you can also connect the phone with a USB cable while the security key operation is running. (But very sadly not on Windows; The USB stack there just isn’t designed in the right way for it.) We might also add L2CAP as an option in the future.

This isn’t enabled on Linux at the moment. Historically trying to do the BLE GATT connection would often fail with bluez, and so the phone as a security key infrastructure was disabled on Linux. Now that the desktop only needs to receive a BLE advert it looks like it could work, but we haven’t flipped that switch yet.

As I mentioned above, we are interested in whether the underlying infrastructure is plausible. Aggregated anonymous statistics are useful for many things but in this case they suggest that BLE isn’t always working as well as it should, but don’t tell us why not. So if you are especially keen about security keys and want to try this out, I’d be interested in your experiences. I can't promise to respond but I will read everything if you send me an email (agl at chromium dot org) or tweet at me (agl__).

Some troubleshooting hints if you're having issues because this will be much faster than asking me!

If no phones appear as options: you're using Windows, macOS, or Chrome OS, yes? And it's an Android phone? And the machine has working Bluetooth? And Chrome is up to date everywhere? If you navigate to chrome://sync-internals, is the “Transport state” at the top-left reporting “Active”? If not, enable Sync. On the phone, does Settings say that Sync is on at the top? If not, enable it. Is the account listed in Settings on the phone the same as the “Username” in chrome://sync-internals on the desktop? If all that's good then probably you just need to wait because it can take a couple of days for the registration to propagate. Probably in the “Device Info” section of the “Sync Node Browser” in chrome://sync-internals your phone is listed, but there's no paask_fields section yet. If you want to short-circuit that, you can install Chrome Canary on the phone and enable syncing there. That should register quite quickly.

You can select the phone on the desktop, but nothing happens: the phone should be getting a cloud message poke that triggers a notification. Does the phone have internet access? Did you completely disable notifications from Chrome? `adb logcat | grep -i cable` would be interesting if you're setup for that. Otherwise, if this is a common issue, I might have to add some logging and ask for the tunnel URL from chrome://device-log.

You get the notification and tap it, but something else goes wrong: if an error code or error message is displayed then I want to know what it is! If it's hanging then the message at the bottom of the screen changes for each step of the process so that's what'll be useful. Most problems seem to be BLE: is the phone close to the desktop? What other BLE is happening on the devices?

Efficient QR codes (26 Aug 2021)

QR codes seem to have won the battle for 2D barcodes, but they're not just a bag of bits inside. Their payload is a series of segments, each of which can have a different encoding. Segments are bitstrings, concatenated without any byte-alignment, and terminated with an empty segment of type zero. If you want to squeeze the maximum amount of data into a QR code without it turning into a gray square, understanding segmentation helps.

The most basic segment type is byte mode, which is a series of 8-bit bytes. If you control the QR decoder then this is perfectly efficient for encoding binary data. But you probably need to work with a variety of decoders. In that case, beware: the first edition of the QR standard, ISO/IEC 18004, said that byte mode should be interpreted as JIS X 0201. The 2005 edition changed that to be ISO/IEC 8859-1 (i.e. Latin-1). In practice, some QR decoders will attempt to content-sniff the encoding because, while UTF-8 contents should be indicated with an ECI header, they often aren't and UTF-8 is really common.

So if you put binary data into a QR code, you are probably going to hit these edge cases. The contents are likely going to be passed to a general operating system API for handling URLs — do you think the full pipeline will handle NUL bytes in the string, and UTF-8 non-characters and invalid surrogate pairs when interpreted as UTF-8? Also, you probably want your QR contents to be a printable string: bits of it might be shown in the scanner's UI; users might need to copy and paste them.

So let's assume that you want something printable and consider an obvious answer: base64url. It's very common, printable, and doesn't contain any characters that are special in URLs for maximum compatibility. It'll be encoded in a byte-mode segment: each base64url character contains 6 bits of information and takes 8 bits in the QR code for an efficiency of 75%. That's our baseline.

The next segment type to consider is digit mode. This only encodes the digits, 0–9, by packing triples of digits into 10 bits. If there are two digits left over at the end then they take 7 bits, and a singleton takes 4 bits. Ignoring the potential digits at the end, this lets you store 3×log2(10) = 3×3.322 = 9.966 bits of information in 10 bits of space. That's 99.66% efficient! So you can clearly do better than base64url.

The last segment type for these purposes is alphanumeric mode. These segments can encode A–Z, 0–9, and nine special characters: $, %, *, +, -, ., /, :, and space. Pairs of these characters are encoded in 11 bits. (If there's an odd number then the last takes 6 bits.) If you consider this to be “base-45” encoding then it stores 2×log2(45) = 10.98 bits in 11 bits of space, for 99.85% efficiency. Even better than digit mode, although only just.

So maybe you should base-45 encode your data using that alphabet. But, of the special characters supported in alphanumeric mode, only two (minus and period) are unreserved (i.e. safe) in URLs. So you might be reduced to base-38, which cuts the efficiency to 95.42%. But having textually smaller QR contents might be valuable and worth a few percent efficiency in your context.

If you've picked base-10 (digits), base-38, or even base-45 for your data then you need to get it into that form. Base-64 is easy because that's exactly 6 bits per character; you work on 3 bytes of input at a time and produce exactly 4 characters of output. But 10, 38, and 45 aren't powers of two. You've got three options here. The obvious conversion would be to treat the input as a bigint and repeatedly divmod by 10 (or 38, etc) to generate the encoding. If you have a bigint library to hand then it almost certainly has the functions for that, but it's a rather obnoxious (and quadratic) amount of computation and a significant dependency. So you might be willing to waste a few bits to make things easier.

Next option is an encoding noted by djb that achieves similar efficiency but with less computation and no long-division. I updated this post to include it, so it's covered in a section below.

The third option is to chunk the input and convert each chunk independently. Ideal input chunks would be 8 bytes or fewer, because nearly all environments will support a uint64 type and nearly all hardware can do a divmod on them. If you're using base-10 then there's going to be a function that can “print” a uint64 to digits for you, so let's take digits as an example. With a chunk size of two bytes you would get five digits. Each digit takes 3⅓ bits of space, so 16 input bits takes 16⅔ bits: 96% efficient. Less than the 99.66% we can get with digits for sure. But if you consider all chunk sizes from one to eight bytes, turning 7-byte chunks into 17 digits is 98.82% efficient. That's pretty good for the complexity savings.

For base-38, 7-byte chunks are the best again, producing 11 characters for 92.56% efficiency. For base-45, two-byte chunks produce 3 characters for 96.97% efficiency. (Four- and eight-byte chunks do the same if you want fewer loop iterations.)

(And you should use little-endian chunks because little-endian has won, even if the IETF hasn't caught up to that yet.)

Now you've got your payload encoding sorted … probably. A wrinkle is that it's difficult to know how your QR encoder will segment what you give it. You might have crafted a wonderful base-38 input and it might stuff it all into a byte-mode segment! (68.65% efficient, worse than base64url.) I'm sadly not aware of a good QR “debugger” that shows all the details of a QR code. ZXing's online service will give a hex-dump of the raw contents, but that assumes that you can decode the segment headers. (QR-Logo promises better debugging output but doesn't work for me.) My best advice is to use ZXing on a sample QR code, ignore the 0xec, 0x11 padding pattern at the end, and calculate whether the number of bytes used roughly makes sense.

You probably want to put a URL-like prefix at the front to make your QR codes distinguishable. One thing to consider is that “https://www.example.com/” is a byte-mode segment that takes 204 bits, but “HTTPS://WWW.EXAMPLE.COM/” is alphanumeric and needs only 145 bits. (That's assuming QR versions 1 through 9, the lengths are slightly different otherwise.) DNS names are case insensitive and “an implementation should accept uppercase letters” for the scheme says RFC 3986. Maybe it just looks odd and that's not worth the bits, though?

We'll finish up with a quick look at an example, which is the thing that started me on this path in the first place: SMART Health Cards. (Thank you to the person who pointed me at them, who likely wants to remain anonymous.)

SHC's are trying to squeeze a lot of data into a QR code: they minify their JSON structure and compress it but, even then, they sometimes span multiple QR codes and the user has to scan them all. As such their contents are a) a binary segment containing “shc:/” (and maybe sequence numbers if using multiple QR codes), and then b) a digits segment containing the payload. So they didn't use “SHC:/” to save bits, but the difference is small.

One thing to note is that the QR spec (ISO/IEC 18004:2005) has a whole section on “structured append” mode, where multiple QR codes can be combined into one. But trying that with iOS and Android shows that neither support it, so probably it can be considered dead and that's why SHC is replicating the same feature at a higher level.

Another thing to note is that SHC is using digits for better efficiency, which is great, but the way that they do it is not. They're using JWT, which is bad but not today's topic, so they have three base64-encoded strings. They then take each base64 character, subtract 45, and turn that into two base-10 digits! All that work minifying JSON and compressing it, and then they throw away 10% of their bits on such a trivial thing!

So SHC did pretty well, but missed an easy win. Having read this, you can do better.

The NTRU Prime encoding

Above, I referenced an encoding that gets nearly all the space efficiency of the bigint-and-divmod method, but without the computational costs. This section is about that. It's taken from page 18 of the NTRU Prime NIST submission.

Our motivating issue is thus: if you have a whole byte then taking it mod 10 to generate a digit works fairly well. The digits 0–5 have probability 26/256 while 6–9 have probability 25/256. That's not uniform therefore it doesn't encode the maximum amount of entropy, but it's 99.992% efficient, which is pretty good. But when you have a smaller range of input values the non-uniformity becomes significant and so does the reduction in information transmitted.

The encoding in NTRU Prime takes input values (which can be larger than bytes) and combines pairs of them. It produces some output values from each pair but, once the non-uniformity gets unconfortable, it pairs up the leftovers to increase the range. This repeats in a binary-tree.

As a concrete example we'll use the Python code from page 18 and set M = [256…] (because our inputs are bytes), change the 256 and 255 to 10 and 9 (to extract digits, not bytes), and set limit to 1024. Our example input will be 419dd0ed371f44b7.

I 65 157 208 237 55 31 68 183 40257→7,5 60880→0,8 7991→1,9 46916→6,1 402/656 608/656 79/656 469/656 399250→0,5,2 307743→3,4,7 399/431 185761→6,1,7 →2,3,1 307/431 132/186

The input bytes are written (in base 10) along the top. Pairs of them are combined. Take the top-right box: its value is 157×256 + 65 = 40257. That can be considered to be 40257 mod 65536 and, since there's a reasonable number of bits in there, two digits are extracted. Obviously 40257 mod 10 = 7, and 4025 mod 10 = 5. So the two digits are 7 and 5. That leaves 402 mod 656, and 656 is below the limit of 1024 that we set, so it's passed down to be combined into the box below. This continues down the tree: each time there's too little data left to extract another digit, the leftovers are passed down to be combinined. At the bottom there's nothing else to combine with so the final leftover value, 132 mod 186, is converted into the digits 2, 3, and 1. The ultimate output digits are concatenated from left-to-right, top-to-bottom.

This achieves good encoding efficiency without repeated long-division, and can be parallelised.


There's an index of all posts and one, long page with them all too. Email: agl AT imperialviolet DOT org.