I've updated the patches linked to in the last post with today's work. Both sides now end up with the same shared key (and not just because they got the same private key from lack of entropy like before). That took some fun tracking down of bugs.
Also, packets are now HMAC-MD5'ed with the shared key, and invalid packets are dropped. That also took far longer than expected. I ended up using the MD5 implementation from the CIFS filesystem because the kernel's crypto library is just plain terrible. It's also totally undocumented but, from what I can see, you can't lookup an algorithm without taking a semaphore, and that requires that you be able to sleep. I almost think I must be missing something because that's dumber than the bastard offspring of Randy Hickey and Jade Goodie.
But there we go. Encryption (with Salsa20) to come next Wednesday.
First Obsfucated TCP patches
After a day of kernel hacking, I have a few patches which, together, make a start towards implementing ObsTCP.
- Add support for Jumbo TCP options, as documented here: tcp-jumbo-options.patch
- Add curve25519: curve25519.patch
- Some ObsTCP work: tcp-obsfucated-tcp.patch
At the moment, it will advertise ObsTCP on all connections and, if you have two kernels which support it, you'll get a shared key setup. At the moment, the private key is generated at boot time and since the host doesn't have any entropy then, it's always the same. So I'll have to do something special there. Also, I've a problem where the ACK with the connecting host's public key can get lost. Since ACKs aren't ACKed, this can be a real pain. I think I need to include it in every transmitted packet until (yet another) option signifies that it's been received.
After the last post explained why small curves aren't good enough for obsfucated TCP, I decided that, since I'm going to have to do some damage to the TCP header to get a bigger public key in there anyway, I might as well go the whole way and use curve25519, by djb. Now, djb has forgotten more about elliptic curves than I'll ever know and I feel much happier using a curve that's been designed by him. As you can probably guess from the name, it's a curve over 2255-19 - a prime. So the public keys are 32 bytes long.
In order to get that much public key material into a TCP header, here's my proposed hack: Jumbo TCP options.
djb's sample implementation of curve25519 is written in a special assembly language called qhasm. Sadly, it's so alpha that he's not actually released it. So the sample implementation is for ia32 only, uses the floating point registers and has 5100 lines of uncommented assembly. It is, however, freaking quick.
However, since I have kernel-space in mind for this I've written a C implementation. It's about 1/3 the speed (and I've not really tried to optimise it yet), doesn't use any floating point (since kernel-space doesn't have easy access to the fp registers in Linux) and fuzz testing seems to indicate that it's correct. (At least, it's giving the same answers as djb's code.)
Next step: hacking up the kernel. (And I thought the elliptic curve maths was hard enough.)
Elliptic curves don't work either
(For context, see my previous post on OTCP)
In any Diffie-Hellman exchange based on elliptic curves, we have Q=aP where P and Q are points on an elliptic curve. The operation of multiplying a point and a scalar is well defined, but unimportant here. The problem facing the attacker is, given Q and P, find a. If they can do that, we're sunk.
If you could find a pair of numbers such that: cP + dQ = eP + fQ then you're done because: (c-e)P = (f-d)Q = (f-d)aP, then a = (c-e)/(f-d) mod n, where n is the size of the field underlying the curve.
Finding such a point by picking random examples is never going to work because of the storage requirements. However, if you define a step function which takes a pair (c, d) and produces a new pair (c', d') you have defined a cycle through the search space. (It must be a cycle because the search space is finite. At some point you must hit a previous state and loop forever.) Now you can use Floyd's cycle finding algorithm to find a collision with constant space. This is an √n algorithm for breaking this problem and is well known as Pollard's rho method.
Now, if you have many of these problems you get a big speed up by using some storage. Assume that you do the legwork to solve an instance of the problem and that you record some fraction of the points that you evaluated. (How you choose the points isn't important so long as it's a function of the point; say pick all points where the first m bits are zero.)
Now, future attempts to break the problem can collide with one of the previous points. If you find cP + dQ = eP + fR (note that P is a constant of the elliptic curve system) and also that R = bP (because we solved this instance previously) then cP + dQ = cP + adP = (e+fb)P and so (c-(e+fb)) / d = a (and we know all the values on the left-hand side).
Now, 2112 (14 bytes) is about as big an elliptic curve point as we can fit in a TCP header. The maximum options payload is 40 bytes, of which 20 are already taken up in modern TCP stacks. We need 2 bytes of fluff per option and, unless we want this to be the last TCP header ever, we need to leave at least 4 bytes. That's where the 14 byte limit comes from.
We give the attacker 250 bytes of space. I believe that each point will take 3*14 bytes of space for the (c,d,Y) triple, where Y = cP+dQ. Thus they can store 244 distinguished points. Thus one in 256-44=12 points are distinguished. Additionally, generating those 244 points isn't that hard, computationally. This suggests that an attacker can find a collision in only 212 iterations., or about 213 field multiplications.
So, again, a reasonable attacker can break our crypto in real time.
This scheme becomes much harder to sell if we have to do evil things to the TCP header in order to make it work.
If you've been wondering what I'm up to at work, we now have a public blog for the RechargeIt project.
How sad: from reading the sleepcat documentation on network partitions, it's clear that BDB uses a broken replication system (i.e. not Paxos). That's a shame because I was hoping to use it.
Yahoo now has OpenID for all its accounts, which is great. Wonderful in fact. OpenID is a good thing for many authentication needs on the Internet and will make the world a better place.
However,...
- SHA256 isn't supported, only SHA1. It's true that the standard doesn't require it, but this still gets you lots of crapness points.
- The return_to is filtered. Probably someone here had good intentions, but I can redirect a browser to any URL, so filtering the return_to is pointless and overly restrictive. Specifically, it appears that:
- You can't have a port number in the host
- You can't have an IP address for a host
- You can't have a single element hostname (like localhost) So, more crapness points for Yahoo.
How good is a 64-bit DH exchange?
In my last post, I suggested that a register based modexp for 64-bit numbers could run at about 500K ops/sec. Well, I wrote one and got 450K ops/sec on an older Core2. (That's with gcc -O3, but no tuning of the code. Plus, I don't know the standard algorithm for 128-bit modulus using 64-bit operations, so I wrote my own, which is almost certainly suboptimal.). Roughly that's 220 ops/s, so a brute force solution of 64-bits would take about 242 seconds, which is more than enough for us.
However, there are much better solutions to the discrete log problem than that. Here I'm only dealing with groups of prime order. There are very good solutions for groups of order 2n, but DH uses prime order groups only.
The best information I found on this are a set of slides by djb. However, they are a little sparse (since they are slides after all). Quick summary:
- Brute force parallelises perfectly. An FPGA chip could do 230 modexps per second. An array of really good ones could push that upwards of 240 modexps/sec.
- Breaking n Diffie-Hellmans isn't much harder than breaking one of them when using brute force. Since you can look for collisions against all n public keys at once. If you were a sniffer trying to sniff hundreds of connections per second, that's actually a big advantage. That could give up an amortised benefit equal to 210 or more.
- You can use "random self reduction" to "split" a problem into many problems and solving any of them they breaks the original problem. Combine this with the previous point and you can speed up the breaking of a single problem.
- If you figure out the optimal number of subproblems to "split" the original problem into you have the "giant step, baby step" algorithm which takes only about 2√n modexps to break (where n is 64 in our case).
- Now things are getting complex, so I'm just going to include the results: Pollard's rho method lets us break 64-bits in 232 modexps.
- The Pohlig-Hellman method is even better, but you can choose a safe prime as your group order to stop it. (A safe prime, p, is such that (p-1)/2 is also prime.)
- The "index calculus" method uses lots of precomputation against the group order to find specific solutions in that group very quickly. I must admit that I'm a little shaky on how index calculus works, but I've found one empirical result where a Matlab solution was breaking 64-bit discrete logs in < 1 minute, including the precomputation.
In short, attacks against discrete log in prime order groups are a lot stronger that I suspected. The index calculus method, esp, seems be a killer against 64-bit DH exchanges providing any sort of security. Since we don't have the time (on the server) or the space (in the TCP options) to include a unique group for each exchange, the precomputation advantage means that it's very possible for a sniffer to be breaking these handshakes in real time.
Damm.
So it would appear that we need larger key sizes and, possibly elliptic curve based systems (the EC systems, in general, can't be attacked with index calculus based methods). RFC 2385 suggests that 16 bytes in a TCP header is about as much as we would want to add (they are talking about SYN packets, which we don't need to put public values in, but the absolute max is 36 bytes.), which gives us 128-bit public values. Looks like I need to read up on EC systems.
OTCP - Obfuscated TCP
Like open SMTP relays, TCP was developed in a kinder, gentler time. With Comcast forging RST packets to disrupt connections and UK ISPs looking to trawl the clickstreams of a nation and sell them (not to mention AT&T copying their backbone to the NSA) it's time that TCP got a little more paranoid.
The 'correct' solutions are something along the lines of IPSec, but there's no reason to suspect that anyone is going to start using that in droves any time soon. Application level crypto (TLS, SSH etc) is the correct solution for protecting the contents of packets (which would stop the clickstream harvesting style of attacks), but cannot protect the TCP layer (and HTTPS is still not the default for websites).
An opportunistic obfuscation layer, on by default, would start to address this. By making it transparent to use, it stands a chance of getting some small fraction of traffic to use it. If it were included in Linux distribution kernels we might hope to see it in the wild after a year or so. In certain sectors (BitTorrent users and trackers) we might see it much sooner.
Our attacker has a couple of weaknesses:
- Their sniffers are in parallel with their backbone for good reason: if the sniffers fail or cannot keep up with the traffic it's not a big deal. This means that they are limited to observing and injecting traffic. Moving inline (to alter traffic) would be very expensive.
- Legally, altering traffic seems to be much more sensitive than filtering it. Much of Comcast's statements about their RST injection have been stressing that it's limiting, not forging nor intercepting (however technically false that might be).
With that in mind I'm going to suggest the following:
SYN packets from OTCP hosts include an empty TCP option advertising their support. OTCP servers, upon seeing the offer in the RST packet, generate a random 64-bit number (n), less than a globally known prime and return 2^n mod p in another TCP option in the SYN,ACK. The client performs the end of a DH handshake and includes its random number in a third option in the next packet to the server.
The two hosts now have a shared key which they can use to MAC and encrypt each packet in the subsequent connection (the MAC will be carried in a TCP option). The MAC function includes the TCP header and payload, except the source and destination port numbers. The encryption only covers the TCP payload, not the IP nor TCP packet.
The hash function and cipher need to very fast and just strong enough; the key is only 64-bits. MD4 for the hash function and AES128 for the cipher, say. (benchmarks for different functions from the Crypto++ library). I suspect that the cipher needs to be a block cipher because packets get retransmitted and reordered. A block cipher in CTR mode based on the sequence number seems to be the best way to deal with this.
A getsockopt interface would allow userland to find out if a given connection is OTCP, and to get the shared key.
Q: Can't this be broken by man-in-the-middle attacks?
Yes. However, note that this would require interception of traffic which is much more costly than sniffers in parallel and legally more troublesome for the attacker. Additionally, userland crypto protocols could be extended to include the shared secret in their certified handshakes, thus giving them MITM-proof security which includes the TCP layer.
Q: Isn't the key size very small?
Yes. However, even if the key could be brute forced in 10 seconds; that's still far too much work for a device which is monitoring hundreds or thousands of connections per second.
Q: Doesn't this break NATs
NATs rewrite the IP addresses and port numbers in the packets, which we don't include in our MAC protection, so everything should work. If the NAT happens to rebuild the whole packet, the OTCP offer in the SYN packet will be removed. In this case we loose OTCP but, most importantly, we don't break any users.
NATs which monitor the application level and try to rewrite IP address in there will be broken by this. However, the number of protocols which do this is small and clients may be configured by default not to offer OTCP when the destination port number matches one of these protocols (IRC and FTP spring to mind). This is a hack, but the downside to users of OTCP must be as small as possible.
Q: So can't I break this by filtering the offer from the SYN packet?
Yes. Application level protocols could be extended to sense this downgrade attack and stop working, but mostly see the points above: it's much more expensive to do this since it needs to be done in the router and it's legally more troublesome for the attacker.
Q: Won't this take too much time?
It's additional CPU load, certainly. The Crypto++ and OpenSSL benchmarks suggest that a full core should be able to handle this at 1 Gbps. Most servers don't see anything like that traffic. Maybe more concerning is the DDoS possibility of using OTCP to force a server to do a 64-bit modexp with a single, unauthenticated packet. A very quick knock-up using the OpenSSL BN library suggests that a single Core2@2.33GHz can do about 50000 random generations and modexps per second. Since the keys are so small, I expect that a tuned implementation (using registers, not bigints) would be about 10x faster. You probably run out of bandwidth from all the SYNs before 500,000 SYNs per second second maxes a single core (it's about 37MB/s). So SYN floods shouldn't be any more of a problem.
Q: What about my high-performance network?
I suggest that offering OTCP be disabled by default for private address ranges. Also, distributions probably won't turn it on for their "server" releases. If all else fails, it'll be a sysctl.
Q: But then I'm wasting CPU time and packet space whenever I'm running SSH or HTTPS
Right. Userland can turn off OTCP using a sockopt if it wishes, or it could just not enable itself for the default destination ports which these protocols use. (Again, that would be an ugly intrusion of default port numbers into the kernel, but this idea wasn't that beautiful to begin with.)
Q: So, what's the plan?
- Write a patch
- Get it in the mainline
- Badger distributions to compile it in with server support and client side off by default.
- In time, get the client side offers turned on by default for "desktop" distributions
- Save Internet
Keyspan USB serial dongle drivers for amd64 Ubuntu 7.10
Ubuntu doesn't ship with this driver, but it's useful: keyspan.ko
To install, copy to /lib/modules/2.6.22-14-generic/kernel/drivers/usb/serial and depmod -a && modprobe keyspan (as root).
Older entries can be found in the archives

