ImperialViolet

(This post is nearing 8 000 words. If you want to throw it onto an ereader there's an EPUB version too.)

Introduction

Over more than a decade, a handful of standards have developed into passkeys—a plausible replacement for passwords. They picked up a lot of complexity on the way, and this post tries to give a chronological account of the development of the core of these technologies. Nothing here is secret; it’s all described in various published standards. However, it can be challenging to read these standards and understand how it’s meant to fit together.

The beginning: U2F

U2F stands for “Universal Second Factor”. It was a pair of standards, one for computers to talk to small removable devices called security keys, and the second a JavaScript API for websites to use them. The first standard of the pair is also called the Client to Authenticator Protocol (CTAP1), and when the term “U2F” is used in isolation, it usually refers to that. The JavaScript API, now obsolete, was generally referred to as the “U2F API”.

The goal of U2F was to eliminate “bearer tokens” in user authentication. A “bearer token” is a term of art in authentication that refers to any secret that is passed around to prove identity. A password is the most common example of such a secret. It’s a bearer token because you prove who you are by disclosing it, on the assumption that nobody else knows the secret. Passwords are not the only bearer tokens involved in computer security by a long way—the infamous cookies that all web users are constantly bothered about are another example. But U2F was focused on user authentication, while cookies identify computers, so U2F was primarily trying to augment passwords.

The problem with bearer tokens is that to use them, you have to disclose them. And knowledge of the token is how you prove your identity. So every time you prove your identity, you are handing another entity the power to impersonate you. Hopefully, the other entity is the intended counterparty and so would gain nothing from impersonating you to itself. But websites are very complicated counterparties, made up of many different parts, any one of which could be compromised to leak these tokens.

Digital signatures are preferable to bearer tokens because it’s possible to prove possession of a private key, using a signature, without disclosing that private key. So U2F allowed signatures to be used for authentication on the web.

While U2F is generally obsolete these days, it defined the core concepts that shaped how everything that came after it worked. (And there remain plenty of U2F security keys in use.) It’s also the clearest demonstration of those concepts, before things got more complex, so we’ll cover it in some detail although the following sections will use modern terminology where things have been renamed, so you’ll see different names if you look at the U2F specs.

Creating a credential

CTAP1 only includes two different commands: one to create a credential and one to get a signature from a credential. Websites make requests using the U2F JavaScript API and the browser translates them into CTAP1 commands.

Here’s the structure of the CTAP1 request for creating a credential:

Offset	Size	Meaning
0	1	Command code, 0x01 to register
1	2	Flags, always zero
3	2	Length of following data, always 64
5	32	SHA-256 hash of client data
37	32	SHA-256 hash of AppID

There are two important inputs here: the two hashes. The first is the hash of the “client data”, a JSON structure built by the browser. The security key includes this hash in its signed output and it’s what allows the browser (or operating system) to put data into the signed message. The JSON is provided to the website by the browser and can include a variety of things, but there are two that are worth highlighting:

Firstly, the origin that made the JavaScript call. (An origin is the protocol, hostname, and port number of a URL.) This allows the website’s server to know what origin the user was interacting with when they were using their security key, and that allows it to stop phishing attacks by rejecting unknown origins. For example, if all sign-in and account actions are done on https://accounts.example.com, then the server needs to permit that as a valid origin. But, by rejecting all other origins, phishing attacks are easily defeated.

When used outside of a web context, for example by an Android app, the “origin” will be a special URL scheme that includes the hash of the public key that signed the app. If a backend server expects users to be signing in with an app, then it must recognize that app as a valid origin value too. (You might see in some documentation that there’s an iOS scheme similarly defined but, in fact, iOS incorrectly puts a web origin into the JSON string even when the request comes from an app.)

The second value from the client data worth highlighting is called the “challenge”. This value is provided by the website and it’s a large random number. Large enough that the website that created it can be sure that any value derived from it must have been created afterwards. This ensures that any reply derived from it is “fresh” and this prevents replay attacks, where an old response is repeated and presented as being new.

There are other values in the JSON string too (e.g. the type of the message, to provide domain separation), but they’re peripheral to this discussion.

Now we’ll discuss the second hash in the request: the AppID hash. The AppID is specified by the website and its hash is forever associated with the newly created credential. The same value must be presented every time the credential is used.

A privacy goal of U2F and the protocols that followed it was to prevent the creation of credentials that span websites, and thus could be a form of “super cookie”. So the AppID hash identifies the site that created a credential and, if some other site tries to use it, it prevents them from doing so. Clearly, to be effective, the browser has to limit what AppIDs a website is allowed to use—otherwise all websites could just decide to use the same AppID and share credentials!

U2F envisioned a process where browsers could fetch the AppID (which is a URL) and parse a JSON document from it that would list other sorts of entities, like apps, that would be allowed to use an AppID. But in practice, I don’t believe any of the browsers ever implemented that. Instead, a website was allowed to use an AppID if the host part of the AppID could be formed by removing labels from the website’s origin without hitting an eTLD. That was a complicated sentence, but don’t worry about it for now. AppIDs are defunct, and we will cover this logic in more detail when we discuss their replacement in a later section.

What you should take away is that credentials have access controls, so that websites can only use their own credentials. This happens to stop most phishing attacks, but that’s incidental: the hash of the JSON from the browser is what is supposed to stop phishing attacks. Rather, the AppID should be seen as a constraint on websites.

Given those inputs, the security key generates a new credential, consisting of an ID and public–private key pair.

Registration errors

Assuming that the request is well-formed, there is only one plausible error that the security key can return, but it happens a lot! The error is called “test of user presence required”. It means that a human needs to touch a sensor on the security key. U2F was designed so that security keys could be implemented in a Java-based framework that did not allow requests to block, so the computer is expected to repeatedly send requests and, if they result in this error, to wait a short amount of time and to send them again. Security keys will generally blink an LED while the stream of requests is ongoing, and that’s a signal to the user to physically touch the security key. If a touch was registered within a short time before the request was received, then the request will be processed successfully.

This shows “user presence”, i.e. that some human actually authorised the operation. Security keys don’t (generally) have a trusted display that says what operation is being performed, but this check does stop malware from getting the security key to perform operations silently.

The registration response

Here’s what comes back from a U2F security key after creating a credential:

Offset	Size	Meaning
0	1	Reserved, always has value 0x05
1	65	Public key (uncompressed X9.62)
66	1	Length of credential ID (“L”)
67	variable	Credential ID
67 + L	variable	X.509 attestation certificate
variable	variable	ECDSA signature

The public key field is hopefully pretty obvious: it’s the public key of the newly created credential. U2F always uses ECDSA with P-256 and SHA-256, and a P-256 point in uncompressed X9.62 format is 65 bytes long.

Next, the credential ID is an opaque identifier for the credential (although we will have more to say about it later).

Then comes the attestation certificate. Every U2F security key has an X.509 certificate that (usually) identifies the make and model of the security key. The private key corresponding to the certificate is embedded within the security key and, hopefully, is hard to extract. Every new credential is signed by this attestation certificate to attest that it was created within a specific make and model of security key.

But a unique attestation certificate would obviously become a tracking vector that identifies a given security key every time it creates a credential! Since we don’t want that, the same attestation certificate is used in many security keys and manufacturers are supposed to use the same certificate for batches of at least 100,000 security keys.

Finally, the response contains the signature, from that attestation certificate, over several fields of the request and response.

Note that there’s no self-signature from the credential. That was probably a mistake in the design, but it’s a mistake that is still with us today. In fact, if you don’t check the attestation signature then nothing is signed and you needn’t have bothered with the challenge parameter at all! That’s why you might see a challenge during registration being set to a single zero byte or other such placeholder value.

Statelessness

The vast majority (probably all?) U2F security keys don’t actually store anything when they create a credential. The credential ID that they return is actually an encrypted seed that allows the security key to regenerate the private key as needed. So the security key has a single root key that it uses to encrypt generated seeds, and those encrypted seeds are the credential IDs. Since you always need to send the credential ID to a U2F security key when getting a signature from it, no per-credential storage is necessary.

The key handle won’t just be an encryption of the seed because you want the security key to be able to ignore key handles that it didn’t generate. Also, the AppID hash needs to be mixed into the ciphertext somehow so that the security key can check it. But any authenticated encryption scheme can manage these needs.

Whenever you reset a stateless security key, it just regenerates its root key, thus invalidating all previous credentials.

Getting assertions

An “assertion” is a signature from a credential. Like we did when we covered credential creation, let’s look at the structure of a CTAP1 assertion request because it still carries the core concepts that we see in passkeys today:

Offset	Size	Meaning
0	1	Command code, 0x02 to get an assertion
1	2	Flags: 0x0700 for “check-only”, 0x0300 otherwise
3	2	Length of following data
5	32	SHA-256 hash of Client Data
37	32	SHA-256 hash of AppID
69	1	Length of credential ID (“L”)
70	variable	Credential ID

We already know what the client data and AppID hashes are. (Although this time you definitely need a random challenge in the client data!)

The security key will attempt to decrypt the credential ID and authenticate the AppID hash. If unsuccessful, perhaps because the credential ID is from a different security key, it will return an error. Otherwise, it will check to see whether its touch sensor has been touched recently and, if so, it will return the requested assertion. (If the touch sensor hasn’t been triggered then the platform does the same polling as when creating a credential, as detailed above.)

The bytes signed by an assertion look like this:

Offset	Size	Meaning
0	32	SHA-256 hash of the AppID
32	1	0x1 if user-presence was confirmed, zero otherwise
33	4	Signature counter
37	32	SHA-256 hash of the Client Data

The signature covers the client data hash, and thus it covers the challenge from the website. So the website can be convinced that it is a fresh signature from the security key. Since the client data also includes the origin, the website can check that the user hasn’t been phished.

There’s also a “signature counter” field. All you need to know is that you should ignore it—the field will generally be zero these days anyway.

Transports

Most security keys are USB devices. They appear on the USB bus as a Human Interface Device (HID) and they have a special usage-page number to identify themselves.

NFC capable security keys are also quite common and frequently offer a USB connection too. When using the security key via NFC, the touch sensor isn’t used. Merely having the security key in the NFC field is considered to satisfy user presence.

There are also Bluetooth security keys. They work over the GATT protocol and their major downside is that they need a battery. For a long time, Bluetooth security keys were the only way to get a security key to work with iOS, but since iOS added native support, they’ve become much less common. (And Yubico now makes a security key with a Lightning connector.)

Connecting U2F to the web

FIDO defined a web API for U2F. I’m not going to go into the details because it’s obsolete now (and Chromium never actually implemented it, instead shipping an internal extension that sites could communicate with via postMessage), but it’s important to understand how browsers translated requests from websites into U2F commands because it’s still the core of how things work now.

When registering a security key, a website could provide a list of already registered credential IDs. The idea was that the user should not mistakenly register the same security key twice, so any security key that recognised one of the already known credential IDs should not be used to complete the registration request.

Browsers implement this by sending a series of assertion requests to each security key to see whether any of the credential IDs are valid for them. That’s why there’s a “check only” mode in the assertion request: it causes the security key to report whether the credential ID was recognised without requiring a touch.

When Chrome first implemented U2F support, any security keys excluded by this check were ignored. But this meant that they never flashed and users found that confusing—they assumed that the security key was broken. So Chrome started sending dummy registration requests to those security keys, which made them flash. If the user touched them, the created credential would be discarded. (That was presumably a strong incentive for U2F security keys to be stateless!)

When signing in, a site sends a list of known credential IDs for the current user. The browser sends a series of “check only” requests to the security keys until it finds a credential recognised by each key. Then it repeatedly sends a normal request for that credential ID until the user touches a security key. The security key that the user touches first “wins” and that assertion is returned to the website.

The need for the website to send a list of credential IDs determines the standard U2F sign-in experience: the user enters their username and password and, if recognised, then the site asks them to tap their security key. A desire to move away from this model motivated the development of the next iteration of the standards.

FIDO2

The U2F ecosystem described above satisfied the needs of second-factor authentication. But that doesn’t get rid of passwords: you still have to enter your password first and then use your security key. If passwords were to be eliminated, more was needed. So an effort to develop a new security key protocol, CTAP2, was started.

Concurrent with the development of CTAP2, an updated web API was also started. That ended up moving to the W3C (the usual venue for web standards) and became the “Web Authentication” spec, or WebAuthn for short.

Together, CTAP2 and WebAuthn constituted the FIDO2 effort.

Discoverable credentials

U2F credentials are called “non-discoverable”. This means that, in order to use them, you have to know their credential ID. “Discoverable” credentials are ones that a security key can find by itself, and thus they can also replace usernames.

A security key with discoverable credentials must dedicate storage for each of them. Because of this, you sometimes see discoverable credentials called “resident credentials”, but there is a distinction between whether the security key keeps state for a credential vs whether it’s discoverable. A U2F security key doesn’t have to be stateless, it could keep state for every credential, and its credential IDs could simply be identifiers. But those credentials are still non-discoverable if they can only be used when their credential ID is presented.

With discoverable credentials comes the need for credential metadata: if the user is going to select their account entirely client-side, then the client needs to know something like a username. So in the FIDO2 model, each credential gets three new pieces of metadata: a username, a user display name, and a user ID. The username is a human-readable string that uniquely identifies an account on a website (it often has the form of an email address). The user display name can be a more friendly name and might not be unique (it often has the form of a legal name). The user ID is an opaque binary identifier for an account.

The user ID is different from the other two pieces of metadata. Firstly, it is returned to the website when signing in, while the other metadata is purely client-side once it has been set. Also, the user ID is semantically important because a given security key will only store a single discoverable credential per website for a given user ID. Attempting to create a second discoverable credential for a website with a user ID that matches an existing one will cause the existing one to be overwritten.

Storing all this takes space on the security key, of course. And, if your security key needs to be able to run within the tight power budget of an NFC device, space might be limited. Also, the interface to manage discoverable credentials didn’t make it into CTAP 2.0 and had to wait for CTAP 2.1, so some early CTAP2 security keys only let you erase discoverable credentials by resetting the whole key!

User verification

You probably don’t want somebody to be able to find your lost security key and sign in as you. So, to replace passwords, security keys are going to have to verify that the correct user is present, not just that any user is present.

So, FIDO2 has an upgraded form of user presence called “user verification”. Different security keys can verify users in different ways. The most basic method is a PIN entered on the computer and sent to the security key. The PIN doesn’t have to be numeric—it can include letters and other symbols too—one might even call it a password if the aim of FIDO wasn’t to replace passwords. But, whatever you call it, it is stronger than typical password authentication because the secret is only sent to the security key, so it can’t leak from some far away password database, and the security key can enforce a limited number of attempts to guess it.

Some security keys do user verification in other ways. They can incorporate a fingerprint reader, or they can have an integrated PIN pad for more secure PIN entry.

RP IDs

FIDO2 replaces AppIDs with “relying party IDs” (RP IDs). AppIDs were URLs, but RP IDs are bare domain names. But otherwise, RP IDs serve the same purpose as AppIDs did in CTAP1.

We only briefly covered the rules for which websites can set which AppIDs before because AppIDs are obsolete, but it’s worth covering the rules for RP IDs in detail because of how important they are in deployments.

A site may use any RP ID formed by discarding zero or more labels from the left of its domain name until it hits an eTLD. So say that you’re https://www.foo.co.uk: you can specify an RP ID of www.foo.co.uk (discarding zero labels), foo.co.uk (discarding one label), but not co.uk because that’s an eTLD. If you don’t set an RP ID in a request then the default is the site’s full domain.

Our www.foo.co.uk example might happily be creating credentials with its default RP ID but later decide that it wants to move all sign-in activity to an isolated origin, https://accounts.foo.co.uk. But none of the passkeys could be used from that origin! The site would have needed to create them with an RP ID of foo.co.uk from the beginning to allow that.

So it’s important to carefully consider your RP ID from the outset. But the rule is not to always use the most general RP ID possible. Going back to our example, if usercontent.foo.co.uk existed, then any credentials with an RP ID of foo.co.uk could be overwritten by pages on usercontent.foo.co.uk. We can assume that foo.co.uk is checking the origin of any assertions, so usercontent.foo.co.uk can’t use its ability to set an RP ID of foo.co.uk to generate valid assertions, but it can still try to get the user to create new credentials which could overwrite the legitimate ones.

CTAP protocol changes

In addition to the high-level semantic changes outlined above, the syntax of CTAP2 is thoroughly different from the U2F. Rather than being a binary protocol with fixed or ad-hoc field lengths, it uses CBOR. CBOR, when reasonably subset, is a MessagePack-like encoding that can represent the JSON data model in a compact binary format, but it also supports a bytestring type to avoid having to base64-encode binary values.

CTAP2 replaces the polling-based model of U2F with one where a security key would wait to process a request until it was able. It also tried to create a model where the entire request would be sent by the platform in a single message, rather than having the platform iterate through credential IDs to find ones that a security key recognised. However, due to limited buffer sizes of security keys, this did not work out: the messages could end up too large, especially when dealing with large lists of credential IDs, so many requests will still involve multiple round trips between the computer and the security key to process.

While I’m not going to cover CTAP2 in any detail, let’s have a look at a couple of examples. Here’s a credential creation request:

{
  # SHA-256 hash of client data
  1: h'60EACC608F20422888C8E363FE35C9544A58B8920989D060021BC30F7323A423',
  # RP ID and friendly name of website
  2: {
    "id": "webauthn.io",
    "name": "webauthn.io"
  },
  3: {
    # User ID
    "id": h'526E4A6C5A41',
    # Username
    "name": "Fred",
    # User Display Name
    "displayName": "Fred"
  },
  4: [
    # ECDSA with P-256 is acceptable to the website
    {"alg": -7, "type": "public-key"},
    # And so is RSA.
    {"alg": -257, "type": "public-key"}
  ],
  # Create a discoverable credential.
  7: {"rk": true},
  # A MAC showing that the user has entered the correct PIN and thus
  # This request has verified the user with "PIN protocol" v1.
  8: h'4153542771C1BF6586718BCD0ECA8E96', 9: 1
}

CBOR is a binary format, but it defines a diagnostic notation for debugging, and that’s how we’ll present CBOR messages here. If you scan down the fields in the message, you’ll see similarities and differences with U2F:

The hash of the client data is still there.
The AppID is replaced by an RP ID, but the RP ID is included verbatim rather than hashed.
There’s metadata for the user because the request is creating a discoverable credential.
The website can list the public key formats that it recognises so that there’s some algorithm agility.
User verification was done by entering a PIN on the computer and there’s some communication about that (which we won’t go into).

Likewise, here’s an assertion request:

{
  # RP ID of the requesting website.
  1: "webauthn.io",
  # Hash of the client data
  2: h'E7870DBBA212581A536D29D38831B2B8192076BAAEC76A4B34918B4222B79616',
  # List of credential IDs
  3: [
    {"id": h'D64875A5A7C642667745245E118FCD6A', "type": "public-key"}
  ],
  # A MAC showing that the user has entered the correct PIN and thus
  # This request has verified the user with "PIN protocol" one.
  6: h'6459AF24BBDA323231CF42AECABA51CF', 7: 1
}

Again, it’s structurally similar to the U2F request, except that the list of credential IDs is included in the request rather than having the computer poll for each in turn. Since the credential that we created was discoverable, critically that list could also be empty and the request would still work! That’s why discoverable credentials can be used before a username has been entered.

With management of discoverable credentials, fingerprint enrollment, enterprise attestation support, and more, CTAP2 is quite complex. But it’s a very capable authentication ecosystem for enterprises and experts.

WebAuthn

As part of the FIDO2 effort, the WebAuthn API was completely replaced. If you recall, the U2F web API was not a W3C standard, and it was only ever implemented in Chromium as a hidden extension. The replacement, called WebAuthn, is a real W3C spec and is now implemented in all browsers.

It is substantially more complicated than the old API!

WebAuthn is integrated into the W3C credential management specification and so it is invoked in JavaScript via navigator.credentials.create and navigator.credentials.get. This document is about understanding the deeper structures that underpin WebAuthn rather than being a guide to its details. So we’ll leave them to the numerous tutorials that already exist on the web and instead focus on how structures from U2F were carried over into WebAuthn and updated.

Firstly, we’ll look at the structure of a signed assertion in WebAuthn.

Offset	Size	Meaning
0	32	SHA-256 hash of the RP ID
32	1	Flags
33	4	Signature counter
37	varies	CBOR-encoded extension outputs
37	32	SHA-256 hash of the client data

It should look familiar because it’s a superset of the CTAP signed message format. This was chosen deliberately so that U2F security keys would function with WebAuthn. This wasn’t a given—there were discussions about whether it should be a fresh start–but ultimately there were lots of perfectly functional U2F security keys out in the world, and it seemed too much of a shame to leave them behind.

But there are changes in the details. Firstly, what was the AppID hash is now the RP ID hash. We discussed RP IDs above and, importantly, the space of AppIDs and the space of RP IDs is distinct. So since U2F security keys compare the hashes of these strings, no credential registered with the old U2F API could function with WebAuthn. From the security keys’ perspective, the hash is incorrect and so the credential can’t be used. Some complicated workarounds were needed for this, which we will touch on later.

The other changes in the assertion format come from defining additional flag bits and adding an extensions block. The most important new flag bit is the one that indicates that user verification was performed in an assertion. (WebAuthn and CTAP2 were co-developed, and so the new concept of user verification from the latter was exposed in the former.)

The extensions block was added to make the assertion format more flexible. While U2F’s binary format was pleasantly simple, it was difficult to change. Since CTAP2 was embracing CBOR throughout, it made sense that security keys be able to return any future fields that needed to be added to the assertion in CBOR format.

Correspondingly, an extension block was added into the WebAuthn requests too (although those are JavaScript objects rather than CBOR). The initial intent was that browsers would transcode extensions into CBOR, send them to the authenticator, and the authenticator could return the result in its output. However, exposing arbitrary and unknown functionality from whatever USB devices were plugged into the computer to the open web was too much for browsers, and no browser ever allowed arbitrary extensions to be passed through like that. Nonetheless, several important pieces of functionality have been implemented via extensions in the subsequent years.

The first major extension was a workaround for the transition to RP IDs mentioned above. The appid extension to WebAuthn allowed a website to assert a U2F AppID when requesting an assertion, so that credentials registered with the old U2F API could still be used. Similarly, the appidExclude extension could specify an AppID in a WebAuthn registration request so that a security key registered under the old API couldn’t be accidentally registered twice.

Overall, the transition to RP IDs probably wasn’t worth it, but we’ve done it now so it’s only a question of learning for the future.

Extensions in the signed response allow the authenticator to add extra data into the response, but the last field in the signed message, the client data hash, is carried over directly from U2F and remains the way that the browser/platform adds extra data. It gained some more fields in WebAuthn:

dictionary CollectedClientData {
    required DOMString           type;
    required DOMString           challenge;
    required DOMString           origin;
    DOMString                    topOrigin;
    boolean                      crossOrigin;
};

The centrally-important origin and challenge are still there, and type for domain separation, but the modern web is complex and often involves layers of iframes and so some more fields have been added to ensure that backends have a clear and correct picture of where the purposed sign-in is happening.

Other types of authenticator

Until now, we have been dealing only with security keys as authenticators. But WebAuthn does not require that all authenticators be security keys. Although aspects of CTAP2 poke through in the WebAuthn data structures, anything that formats messages correctly can be an authenticator, and so laptops and desktops themselves can be authenticators.

These devices are known as “platform authenticators”. At this point in our evolution, they are aimed at a different use case than security keys. Security keys are called “cross-platform authenticators” because they can be moved between devices, and so they can be used to authenticate on a brand-new device. A platform authenticator is for when you need to re-authenticate a user, that is, to establish that the correct human is still behind the keyboard. Since we want to validate a specific human, platform authenticators must support user verification to be useful for this.

And so there is a specific feature detection function called isUserVerifyingPlatformAuthenticatorAvailable (usually shortened to “isUVPAA” for obvious reasons). Any website can call this and it will return true if there is a platform authenticator on the current device that can do user verification.

The majority of WebAuthn credentials are created on platform authenticators now because they’re so readily available and easy to use.

caBLE / hybrid

While platform authenticators were great for reauthenticating on the same computer, they could never work for signing in on a different computer. And the set of people who were going to go out and buy security keys was always going to be rather small. So, to broaden the reach of WebAuthn, allowing people to use their phones as authenticators was an obvious step.

CTAP over BLE was already defined, but Bluetooth pairing was an awkward and error-prone process. Could we make phones usable as authenticators without it?

The first attempt was called cloud-assisted BLE (caBLE) and it involved the website and the phone having a shared key. A WebAuthn extension allowed the website to request that a computer start broadcasting a byte string over BLE. The idea was that the phone would be listening for these BLE adverts, would trial decrypt their contents against the set of shared keys it knew about, and (if it found a match) it would start advertising in response. When the computer saw a matching reply, it would make a Generic Attribute Profile (GATT) connection to that phone, do encryption at the application level, and then CTAP could continue as normal, all without having to do Bluetooth pairing.

This was launched as a feature specific to accounts.google.com and Chrome. For several years you could enable “Phone as a Security Key” for your Google account and it did something like that. But, despite a bunch of effort, there were persistent problems:

Firstly, listening for Bluetooth adverts in the background was difficult in the Android ecosystem. To work around this, accounts.google.com would send a notification to the phone over the network to tell it when to start listening. This was fine for accounts.google.com, but most websites can’t do that.

Second, the quality of Bluetooth hardware in desktops varies considerably, and getting a desktop to send more than one BLE advert never worked well. So you could only have one phone enrolled for this service, per account.

Lastly, but most critically, BLE GATT connections were just too unreliable. Even after a considerable amount of work to try and debug issues, the most reliable combination of phone and desktop achieved only 95% connection success—and that’s after the two devices had managed to exchange BLE adverts. In common configurations, the success rate was closer to 80% and it would randomly fail even for the people developing it. So despite trying for years to make this design work, it had to be abandoned.

The next attempt was called caBLEv2. Given all the issues with BLE in the previous iteration, caBLEv2 was designed to use the least amount of Bluetooth possible: a single advert sent from the phone to the desktop. This means that the rest of the communication went over the internet, which requires that both phone and desktop have an internet connection. This is unfortunate, but there were no other viable options. Using Bluetooth Classic presents a host of problems, and BLE L2CAP does not work from user space on Windows.

Still, using Bluetooth somewhere in the protocol is critical because it proves proximity between the two devices. If all communication was done over the Internet, then the phone has no proof that the computer it is sending the assertion to is nearby. It could be an attacker’s computer on the other side of the world. But if we can send one Bluetooth message from the phone and make the computer prove that it has received it, then all other communication can be routed over the Internet. And that is what caBLEv2 does.

It also changed the relationship between the parties. While caBLEv1 required that a key be shared between the website and the phone, caBLEv2 was a relationship between a computer and a phone. This made some user flows less smooth, but it made it much easier for smaller websites to take advantage of the capability.

In practice, caBLEv2 has worked far better, although Bluetooth problems still occur. (And not every desktop has Bluetooth.)

A caBLEv2 transaction is often triggered by a browser showing a QR code. That QR code contains a public key for the browser and a shared secret. When a phone scans it, it starts sending a BLE advert that is encrypted with the shared secret and which contains a nonce and the location of an internet server that communication can be routed through. The desktop decrypts this advert, connects to that server (which forwards messages to the phone and back), and starts a cryptographic handshake to prove that it holds the keys from the QR code and that it received the BLE advert. Once that communication channel is established, CTAP2 is run over it so that the phone can be used as an authenticator.

caBLEv2 also allows the phone to send information to the desktop that allows the desktop to contact it in the future without scanning a QR code. This depends on that same internet service, which must be able to send a notification to the phone, rather than constant BLE listening. (Although a BLE advert is sent for every transaction to prove proximity.)

But ultimately, while the name caBLE was cute, it was also confusing. And so FIDO renamed it to “hybrid” when it was included in CTAP 2.2. So you’ll now see this called “hybrid CTAP” and the transport name in WebAuthn is hybrid.

The WebAuthn-family of APIs

WebAuthn is a web API, but people also use their computers and phones outside of a web browser sometimes. So while these contexts can’t use WebAuthn itself, a number of APIs for native apps that are similar to WebAuthn have popped up. These APIs aren’t WebAuthn, but if they produce signed messages in the same format as WebAuthn, a backend server needn’t know the difference. It’s a term that I’ve made up, but I call them “WebAuthn-family” APIs.

On Windows, webauthn.dll is a system service that reproduces most of WebAuthn for apps. (Browsers on Windows use this to implement WebAuthn, so it has to be pretty complete.) On iOS and macOS, Authentication Services does much the same. On Android, Credential Manager allows apps to pass in JSON-encoded WebAuthn requests and get JSON responses back. WebAuthn Level Three also includes support for the same JSON encoding so that backends should seamlessly be able to handle sign-ins from the web and Android apps. (WebAuthn should never have used ArrayBuffers.)

Passkeys

With hybrid and platform authenticators, people had lots of access to WebAuthn authenticators. But if you reset or lost your phone/laptop you still lost all of your credentials, same as if you reset or lost a security key. In an enterprise situation, losing a security key is resolved by going to the helpdesk. In a personal context, the advice had long been to register at least two security keys and to keep one of them locked away in a safe. But it’s awfully inconvenient to register a security key everywhere when it’s locked in a safe. So while this advice worked for protecting a tiny number of high-value accounts, if WebAuthn credentials were ever going to make a serious dent in the regular authentication ecosystem, they had to do better.

“Better” has to mean “recoverable”. People do lose and reset their phones, and so a heretofore sacred property of FIDO would have to be relaxed so that it could expand its scope beyond enterprises and experts: private keys would have to be backed up.

In 2021, with iOS 15, Apple included the ability to save WebAuthn private keys into iCloud Keychain, and Android Play Services got support for hybrid. At the end of 2022, iOS 16 added support for hybrid and, on Android, Google Password Manager added support for backing up and syncing private keys.

People now had common access to authenticators, the ability to assert credentials across devices with them, and fair assurance that they could recover those credentials. To bundle that together and give it a more friendly name, Apple introduced better branding: passkeys.

With passkeys, the world now has a widely available authentication mechanism that isn’t subject to phishing, isn’t subject to password reuse nor credential stuffing, can’t be sniffed and replayed by malicious 3rd-party JavaScript on the website, and doesn’t cause a mess when the server-side password database leaks.

There is some ambiguity about the definition of passkeys. Passkeys are synced, discoverable WebAuthn credentials. But we don’t want to exclude people who really want to use a security key, so if you would like to create a credential on a security key, we assume you know what you’re doing and the UI will refer to them as passkeys even though they aren’t synced. Also, we’re still building the ecosystem of syncing, which is quite fragmented presently: Windows Hello doesn’t sync at all, Google Password Manager can only sync between Android devices, and iCloud Keychain only works on Apple devices. So there is a fair chance that if you create a credential that gets called a passkey, it might not actually be backed up anywhere. So the definition is a little bit aspirational for the moment, but we’re working on it.

Another feature that came with the introduction of passkeys was integration into browser autofill. (This is also called “conditional UI” because of the name of a value in the W3C credential management spec.) So websites can now opt to have passkeys listed in autofill, as passwords are. This is not a long-term design! It would be weird if in 20 years websites had to have a pair of text boxes on their front page for signing in, in the same way that we use an icon of a floppy disk to denote saving. But conditional UI hopefully makes it much easier for websites to adopt passkeys, given that they are starting with a user base that is 100% password users.

If you want to understand how passkey support works on a website, see here. But remember that the core concepts stretch back to U2F: passkeys are still partitioned by an RP ID, they still have credential IDs, and there’s still the client data containing a server-provided challenge.

The future

The initial launch of passkeys didn’t have any provision for third-party password managers. On iOS and macOS, you had to use iCloud Keychain, and on Android you had to use Google Password Manager. That was expedient but never the intended end state, and with iOS 17 and Android 14, third-party password managers can save and provide passkeys.

At the time of writing, in 2023, most of the work is in building out the ecosystem that we have sketched. Passkeys need to sync to more places, and third-party password manager support needs to get fleshed out.

There are a number of topics on the horizon, however. With FIDO2, CTAP, and WebAuthn, we are asking websites to trust password managers a lot more. While password managers have long existed, usage is far from universal. But with FIDO2, by design, users have to use a password manager. We are also suggesting that with passkeys, websites might not need to use a second authentication factor. Two-factor authentication has become commonplace, but that’s because the first factor (the password) was such rubbish. With passkeys, that’s no longer the case. That brings many benefits! But it means that websites are outsourcing their authentication to password managers, and some would like some attestation that they’re doing a good job.

Next, the concept of an RP ID is central to passkeys, but it’s a very web-centric concept. Some services are mobile-only and don’t have a strong brand in the form of a domain name. But passkeys are forever associated with an RP ID, which forces apps to commit to the domain name that might well appear in the UI.

The purpose of the RP ID was to stop credentials from being shared across websites and thus becoming a tracking vector. But now that we have a more elaborate UI, perhaps we could show the user the places where credentials are being used and let the RP ID be a hash of a public key, or something else not tied to DNS.

We also need to think about the problem of users transitioning between ecosystems. People switch from Android to iOS and vice versa, and they should be able to bring their passkeys along with them.

There is a big pile of corpses labeled “tried to replace passwords”. Passkeys are the best attempt so far. Here's hoping that in five years’ time, that they’re not a cautionary tale.

From U2F to passkeys (23 Jul 2023)