In a previous post I covered some of the issues with implementing ECC operations. I also mentioned the problems with the number of terms in the P256 prime.
Having spent a fair amount of time optimising OpenSSL's elliptic curve code, here are the current speeds:
P224 and P256 are not, fundamentally, very different. However, P256 has a nasty prime formation, as I explained previously, which kills the speed. Sadly, if you want to support the existing fleet of browsers, P256 is your fastest option.
(The multiplicative DH is measured using OpenSSL's code. curve25519 is my own code, based on djb's. P224 is Emilia Kasper's, inspired by curve25519. P521 is my own code based on Emilia's. P256 is my own code, also based on Emilia's, with a very different form from in an attempt to get it running faster.)
(The measurements are taken on a single core of my 2.3GHz Xeon E5520. They are a scalar multiplication of the base point followed by a scalar multiplication of the result. They don't include point validation. The OpenSSL code involves additional overhead from the BIGNUM conversions, however I consider that fair game because you have to do them if you want to use the code.)