The several canons of CBOR (17 Apr 2022)
There are many encoding formats. CBOR is one of them. Like several others, a subset of it basically fine—I'm not starting that fight today.
Whatever encoding you use, it's nice to reduce flexibility. If there are multiple ways of encoding the same thing then, for anything with a non-negligible diversity of implementations, you'll find that there is a canonical encoding, it's just not documented. As soon as one implementation tries using a valid, but less common encoding, it'll find that it doesn't work and something will change to sort it all out. But that process of eventual coordination is expensive—better to do it explicitly and up-front.
For example, TCP options are chunks of tag–length–value and the spec doesn't define any restriction on their ordering. That didn't stop me blowing up a Ubuntu release because a change of mine happened to alter the order that Linux sent them, and some DSL modems dropped packets when options came in that order. That's an expensive discovery of an implicit canonicalisation rule.
If you're using CBOR then RFC 7049 has a section titled “Canonical CBOR”. One of the things it defines is how to order map elements. Great! No more TCP options messes. You can read that section for yourself, but one interpretation of the words there is what I'll call the three-step ordering for map keys:
- Lowest valued types come first then, within each type,
- Shortest encoded key comes first then, within consecutive keys with equal lengths,
- Sort lexicographically.
However, there is another interpretation of the same section which I'll call the two-step ordering. It drops the first step, i.e. doesn't sort by types. When I first read that section, I certainly came away with only one interpretation, but I can somewhat see how the other arises.
CTAP, the protocol for talking to security keys, uses CBOR a lot and explicitly picks the three-step ordering. An errata was eventually raised against the RFC to “clarify” that the three-step order was correct, but it was rejected as a semantic change. So perhaps that's an official ruling that two-step was the correct understanding?
It only matters if you mix different types of keys in the same map, and sometimes you don't, so maybe it's moot for you. (But keep in mind that negative and non-negative numbers are different types in CBOR.) Anyway, the IETF revised the CBOR RFC to
firmly clarify the issue add a third option:
RFC 7049 is obsoleted by 8949. The old “Canonical CBOR” section is replaced by “Core Deterministic Encoding Requirements” and that specifies what I'll have to call the one-step ordering:
- Sort lexicographically on the encoded key.
It has some advantages! Like, why was it ever more complicated than that in the first place? But it was, and now there's two (or three) orderings. The two-step ordering even has a subsection of its own in the new RFC and a note saying that it “could be called Old Canonical CBOR”. If only shipped implementations were as changeable as the X-Men universe.
So that's why there's two and half “canonical” CBORs and now I have something that I can reference when people ask me about this.
Update: Someone on Twitter (Thomas Duboucher, perhaps? Sorry, I lost the tweet!) pointed out that the 1- and 3-step ordering give the same results in many cases. Having thought about it, that's correct! As long as you don't use maps, arrays, or tagged values as map keys, I believe they coincide. For something like CTAP2, where map keys are always ints or strings, they work out the same. So perhaps things aren't so bad. (Or perhaps a really subtle difference is worse! Time will tell.)
Picking parameters (15 Mar 2022)
When taking something from cryptographic theory into practice, it's very important to pick parameters. I don't mean picking the right parameters — although that certainly helps. I mean picking parameters at all.
That might seem obvious, but there are pressures pushing towards abdication: what if you get it wrong? Why not hedge bets and add another option? What about the two groups who already shipped something? We could just add options so that everyone's happy.
There are always exceptions, but the costs of being flexible are considerable. ANSI X9.62, from 1998, specified how elliptic curves were encoded in ASN.1 and included not just 27 possible curves, but allowed the parameters to be inherited from the issuing CA and allowed completely arbitrary elliptic curves to be defined inside a public key. That specifies almost nothing and avoids really standardising on anything. Thankfully I've not seen parameter inheritance implemented for ECC (it would be a security mess) but support for custom elliptic curves does, sadly, exist.
A couple of years ago, Microsoft had a signature bypass because of support for arbitrary curves [details]. Today, OpenSSL had an infinite loop in certificate parsing because of the same. On the flip side, I'm not aware that anyone has ever used the ability to specify arbitrary curves for anything useful.
It's not fair to just blame X9.62: don't get me started about RSA PSS. The cost of having too many options is frequently underestimated. Why does AES-192 exist, for example, other than to inflate test matrices?
As an aside, it's interesting to wonder how many CPU decades have been spent fuzzing OpenSSL's certificate parsing and yet today's issue wasn't reported before. This isn't a 2-128 fraction of state space that fuzzers couldn't be expected to find, and I'm not sure that Tavis found it from fuzzing at all.
Phones as security keys in Chrome (20 Oct 2021)
With Chrome 94, if you have an Android phone with Chrome on it, and it’s syncing to the same Google account as Chrome on a Chrome OS/Windows/macOS device, then you’ll be able to use that phone as a security key. You should be able to try this out on any WebAuthn using website, for example here. (But not accounts.google.com, which uses a different system.)
The reason that you are reading about this here and not on an official Google site is that people shouldn’t start registering their phone as a security key unless they have a physical security key as a back up. Just like a regular security key, if you lose the phone then you lose the credentials. So, just like a regular security key, you should have a back up. (You can also lose your credentials if you remove the screen lock, or somehow wipeout Play Services state — e.g. by doing a factory reset.)
We have plans for addressing this and making this suitable for regular people, and to allow use in other profiles, but we’re not there yet. We are interested in whether the communications infrastructure is good enough though. (More below.)
For signing into Google, it has long been possible to use your phone as a security key. This only worked in Chrome and functioned over BLE GATT between the desktop and phone. We have wanted to expand this to the web in general for years, but the success rate that we measured with BLE was poor. After quite a lot of work trying to improve the BLE success rate, we didn’t achieve much.
But the use of BLE is more than just a convenience. The security model demands some proof of physical proximity between the authenticator and the machine that is being authenticated. For a USB security key the authenticator will only respond to something that is making physical contact with it. So when a phone is acting as a security key it needs to prove that the machine it is talking to is physically close by. (Or at least that the attacker is in control of a BLE radio that is physically close.)
We looked at other Bluetooth modes in the hopes that they might work better, but classic Bluetooth RFCOMM isn’t supported on iOS and requires a lot of user interaction on android. BLE L2CAP is supported on iOS, but isn’t supported (in user space) on Windows. It’s also flaky in the face of MAC address rotation if the devices aren’t paired.
So where we’ve ended up is that all the communication happens over the internet connection, but the phone sends a nonce in a BLE advert and the other end of the channel has to prove receipt. That’s the least amount of Bluetooth we could use while still requiring physical proximity. Needing bilateral internet connectivity is unfortunate though. So you can also connect the phone with a USB cable while the security key operation is running. (But very sadly not on Windows; The USB stack there just isn’t designed in the right way for it.) We might also add L2CAP as an option in the future.
This isn’t enabled on Linux at the moment. Historically trying to do the BLE GATT connection would often fail with bluez, and so the phone as a security key infrastructure was disabled on Linux. Now that the desktop only needs to receive a BLE advert it looks like it could work, but we haven’t flipped that switch yet.
As I mentioned above, we are interested in whether the underlying infrastructure is plausible. Aggregated anonymous statistics are useful for many things but in this case they suggest that BLE isn’t always working as well as it should, but don’t tell us why not. So if you are especially keen about security keys and want to try this out, I’d be interested in your experiences. I can't promise to respond but I will read everything if you send me an email (agl at chromium dot org) or tweet at me (agl__).
Some troubleshooting hints if you're having issues because this will be much faster than asking me!
If no phones appear as options: you're using Windows, macOS, or Chrome OS, yes? And it's an Android phone? And the machine has working Bluetooth? And Chrome is up to date everywhere? If you navigate to chrome://sync-internals, is the “Transport state” at the top-left reporting “Active”? If not, enable Sync. On the phone, does Settings say that Sync is on at the top? If not, enable it. Is the account listed in Settings on the phone the same as the “Username” in chrome://sync-internals on the desktop? If all that's good then probably you just need to wait because it can take a couple of days for the registration to propagate. Probably in the “Device Info” section of the “Sync Node Browser” in chrome://sync-internals your phone is listed, but there's no paask_fields section yet. If you want to short-circuit that, you can install Chrome Canary on the phone and enable syncing there. That should register quite quickly.
You can select the phone on the desktop, but nothing happens: the phone should be getting a cloud message poke that triggers a notification. Does the phone have internet access? Did you completely disable notifications from Chrome? `adb logcat | grep -i cable` would be interesting if you're setup for that. Otherwise, if this is a common issue, I might have to add some logging and ask for the tunnel URL from chrome://device-log.
You get the notification and tap it, but something else goes wrong: if an error code or error message is displayed then I want to know what it is! If it's hanging then the message at the bottom of the screen changes for each step of the process so that's what'll be useful. Most problems seem to be BLE: is the phone close to the desktop? What other BLE is happening on the devices?
Efficient QR codes (26 Aug 2021)
QR codes seem to have won the battle for 2D barcodes, but they're not just a bag of bits inside. Their payload is a series of segments, each of which can have a different encoding. Segments are bitstrings, concatenated without any byte-alignment, and terminated with an empty segment of type zero. If you want to squeeze the maximum amount of data into a QR code without it turning into a gray square, understanding segmentation helps.
The most basic segment type is byte mode, which is a series of 8-bit bytes. If you control the QR decoder then this is perfectly efficient for encoding binary data. But you probably need to work with a variety of decoders. In that case, beware: the first edition of the QR standard, ISO/IEC 18004, said that byte mode should be interpreted as JIS X 0201. The 2005 edition changed that to be ISO/IEC 8859-1 (i.e. Latin-1). In practice, some QR decoders will attempt to content-sniff the encoding because, while UTF-8 contents should be indicated with an ECI header, they often aren't and UTF-8 is really common.
So if you put binary data into a QR code, you are probably going to hit these edge cases. The contents are likely going to be passed to a general operating system API for handling URLs — do you think the full pipeline will handle NUL bytes in the string, and UTF-8 non-characters and invalid surrogate pairs when interpreted as UTF-8? Also, you probably want your QR contents to be a printable string: bits of it might be shown in the scanner's UI; users might need to copy and paste them.
So let's assume that you want something printable and consider an obvious answer: base64url. It's very common, printable, and doesn't contain any characters that are special in URLs for maximum compatibility. It'll be encoded in a byte-mode segment: each base64url character contains 6 bits of information and takes 8 bits in the QR code for an efficiency of 75%. That's our baseline.
The next segment type to consider is digit mode. This only encodes the digits, 0–9, by packing triples of digits into 10 bits. If there are two digits left over at the end then they take 7 bits, and a singleton takes 4 bits. Ignoring the potential digits at the end, this lets you store 3×log2(10) = 3×3.322 = 9.966 bits of information in 10 bits of space. That's 99.66% efficient! So you can clearly do better than base64url.
The last segment type for these purposes is alphanumeric mode. These segments can encode A–Z, 0–9, and nine special characters: $, %, *, +, -, ., /, :, and space. Pairs of these characters are encoded in 11 bits. (If there's an odd number then the last takes 6 bits.) If you consider this to be “base-45” encoding then it stores 2×log2(45) = 10.98 bits in 11 bits of space, for 99.85% efficiency. Even better than digit mode, although only just.
So maybe you should base-45 encode your data using that alphabet. But, of the special characters supported in alphanumeric mode, only two (minus and period) are unreserved (i.e. safe) in URLs. So you might be reduced to base-38, which cuts the efficiency to 95.42%. But having textually smaller QR contents might be valuable and worth a few percent efficiency in your context.
If you've picked base-10 (digits), base-38, or even base-45 for your data then you need to get it into that form. Base-64 is easy because that's exactly 6 bits per character; you work on 3 bytes of input at a time and produce exactly 4 characters of output. But 10, 38, and 45 aren't powers of two. You've got three options here. The obvious conversion would be to treat the input as a bigint and repeatedly divmod by 10 (or 38, etc) to generate the encoding. If you have a bigint library to hand then it almost certainly has the functions for that, but it's a rather obnoxious (and quadratic) amount of computation and a significant dependency. So you might be willing to waste a few bits to make things easier.
Next option is an encoding noted by djb that achieves similar efficiency but with less computation and no long-division. I updated this post to include it, so it's covered in a section below.
The third option is to chunk the input and convert each chunk independently. Ideal input chunks would be 8 bytes or fewer, because nearly all environments will support a uint64 type and nearly all hardware can do a divmod on them. If you're using base-10 then there's going to be a function that can “print” a uint64 to digits for you, so let's take digits as an example. With a chunk size of two bytes you would get five digits. Each digit takes 3⅓ bits of space, so 16 input bits takes 16⅔ bits: 96% efficient. Less than the 99.66% we can get with digits for sure. But if you consider all chunk sizes from one to eight bytes, turning 7-byte chunks into 17 digits is 98.82% efficient. That's pretty good for the complexity savings.
For base-38, 7-byte chunks are the best again, producing 11 characters for 92.56% efficiency. For base-45, two-byte chunks produce 3 characters for 96.97% efficiency. (Four- and eight-byte chunks do the same if you want fewer loop iterations.)
(And you should use little-endian chunks because little-endian has won, even if the IETF hasn't caught up to that yet.)
Now you've got your payload encoding sorted … probably. A wrinkle is that it's difficult to know how your QR encoder will segment what you give it. You might have crafted a wonderful base-38 input and it might stuff it all into a byte-mode segment! (68.65% efficient, worse than base64url.) I'm sadly not aware of a good QR “debugger” that shows all the details of a QR code. ZXing's online service will give a hex-dump of the raw contents, but that assumes that you can decode the segment headers. (QR-Logo promises better debugging output but doesn't work for me.) My best advice is to use ZXing on a sample QR code, ignore the 0xec, 0x11 padding pattern at the end, and calculate whether the number of bytes used roughly makes sense.
You probably want to put a URL-like prefix at the front to make your QR codes distinguishable. One thing to consider is that “https://www.example.com/” is a byte-mode segment that takes 204 bits, but “HTTPS://WWW.EXAMPLE.COM/” is alphanumeric and needs only 145 bits. (That's assuming QR versions 1 through 9, the lengths are slightly different otherwise.) DNS names are case insensitive and “an implementation should accept uppercase letters” for the scheme says RFC 3986. Maybe it just looks odd and that's not worth the bits, though?
We'll finish up with a quick look at an example, which is the thing that started me on this path in the first place: SMART Health Cards. (Thank you to the person who pointed me at them, who likely wants to remain anonymous.)
SHC's are trying to squeeze a lot of data into a QR code: they minify their JSON structure and compress it but, even then, they sometimes span multiple QR codes and the user has to scan them all. As such their contents are a) a binary segment containing “shc:/” (and maybe sequence numbers if using multiple QR codes), and then b) a digits segment containing the payload. So they didn't use “SHC:/” to save bits, but the difference is small.
One thing to note is that the QR spec (ISO/IEC 18004:2005) has a whole section on “structured append” mode, where multiple QR codes can be combined into one. But trying that with iOS and Android shows that neither support it, so probably it can be considered dead and that's why SHC is replicating the same feature at a higher level.
Another thing to note is that SHC is using digits for better efficiency, which is great, but the way that they do it is not. They're using JWT, which is bad but not today's topic, so they have three base64-encoded strings. They then take each base64 character, subtract 45, and turn that into two base-10 digits! All that work minifying JSON and compressing it, and then they throw away 10% of their bits on such a trivial thing!
So SHC did pretty well, but missed an easy win. Having read this, you can do better.
The NTRU Prime encoding
Above, I referenced an encoding that gets nearly all the space efficiency of the bigint-and-divmod method, but without the computational costs. This section is about that. It's taken from page 18 of the NTRU Prime NIST submission.
Our motivating issue is thus: if you have a whole byte then taking it mod 10 to generate a digit works fairly well. The digits 0–5 have probability 26/256 while 6–9 have probability 25/256. That's not uniform therefore it doesn't encode the maximum amount of entropy, but it's 99.992% efficient, which is pretty good. But when you have a smaller range of input values the non-uniformity becomes significant and so does the reduction in information transmitted.
The encoding in NTRU Prime takes input values (which can be larger than bytes) and combines pairs of them. It produces some output values from each pair but, once the non-uniformity gets unconfortable, it pairs up the leftovers to increase the range. This repeats in a binary-tree.
As a concrete example we'll use the Python code from page 18 and set M = [256…] (because our inputs are bytes), change the 256 and 255 to 10 and 9 (to extract digits, not bytes), and set limit to 1024. Our example input will be 419dd0ed371f44b7.
The input bytes are written (in base 10) along the top. Pairs of them are combined. Take the top-right box: its value is 157×256 + 65 = 40257. That can be considered to be 40257 mod 65536 and, since there's a reasonable number of bits in there, two digits are extracted. Obviously 40257 mod 10 = 7, and 4025 mod 10 = 5. So the two digits are 7 and 5. That leaves 402 mod 656, and 656 is below the limit of 1024 that we set, so it's passed down to be combined into the box below. This continues down the tree: each time there's too little data left to extract another digit, the leftovers are passed down to be combinined. At the bottom there's nothing else to combine with so the final leftover value, 132 mod 186, is converted into the digits 2, 3, and 1. The ultimate output digits are concatenated from left-to-right, top-to-bottom.
This achieves good encoding efficiency without repeated long-division, and can be parallelised.
ACVP (23 Dec 2020)
If you do not know what ACVP is then you should read no further. If you think it might be useful then what you're actually looking for is Wycheproof; ACVP is only for those who have no choice.
If you're still reading and you're vaguely aware that your previous CAVP infrastructure isn't applicable any longer, and that you'll need to deal with ACVP next time, then you might be interested in BoringSSL's ACVP infrastructure. We have a number of different FIPS modules to test and wanted something generic rather than repeating the bespoke-per-module way that we handled CAVP. We also need to test not just BoringCrypto (a software module) but also embedded devices.
The result, acvptool, lives within the BoringSSL repo and can translate ACVP JSON into a series of reasonably simple IPC calls that a “module wrapper” speaks over stdin/stdout. BoringSSL's module wrapper is the reference implementation, but there's also a tiny one for testing that could easily be repurposed to forward over a serial link, etc, for embedded devices.
It's reasonably likely that you'll find some case that's not handled, but the code is just Go throwing around JSON so you should be able to extend it without too much bother. But, for the cases that are already handled, the weird undocumented quirks that'll otherwise consume hours of your life are taken care of.