What a difference a prime makes (21 Dec 2010)

In a previous post I covered some of the issues with implementing ECC operations. I also mentioned the problems with the number of terms in the P256 prime.

Having spent a fair amount of time optimising OpenSSL's elliptic curve code, here are the current speeds:

image/svg+xml Produced by GNUPLOT 4.2 patchlevel 6 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 80 100 120 140 160 180 200 220 240 260 1024-bit DH 2048-bit DH P224 curve25519 P256 P521 Security level (bits) Operations / core second

(The graph is inline SVG. Can't see it? Get a better browser.)

P224 and P256 are not, fundamentally, very different. However, P256 has a nasty prime formation, as I explained previously, which kills the speed. Sadly, if you want to support the existing fleet of browsers, P256 is your fastest option.

(The multiplicative DH is measured using OpenSSL's code. curve25519 is my own code, based on djb's. P224 is Emilia Kasper's, inspired by curve25519. P521 is my own code based on Emilia's. P256 is my own code, also based on Emilia's, with a very different form from in an attempt to get it running faster.)

(The measurements are taken on a single core of my 2.3GHz Xeon E5520. They are a scalar multiplication of the base point followed by a scalar multiplication of the result. They don't include point validation. The OpenSSL code involves additional overhead from the BIGNUM conversions, however I consider that fair game because you have to do them if you want to use the code.)