Macs everywhere

The Setup is a neat site. They find (somewhat) famous techie people and interview them about what hardware and software they use. I was browsing through it because I recognised some of the names, and because it's always neat to find out about tools that you don't use.

But it struck me how many people were using OS X as their primary, day to day, operating system. So I went through every one of them and added up the numbers (except Why the Lucky Stiff, because they put an underscore at the start of the domain name; stopping it from resolving).

Windows: 3.5, Linux: 3, Mac: 29.5

These folks aren't all hardcore coders to be sure, but one of the Linux users is RMS and I'm not sure he counts! It would be like asking Steve Jobs what he uses.

But gosh, that's a total OS X domination.

Strict Transport Security

Chrome 4 went stable yesterday. One of the many new things in this release is the addition of Strict Transport Security. STS allows a site to request that it always be contacted over HTTPS. So far, only Chrome supports it. However, the popular NoScript Firefox extension also supports it and hopefully support will appear in Firefox proper at some point.

The issue that STS addresses is that users tend to type http:// at best, and omit the scheme entirely most of the time. In the latter case, browsers will insert http:// for them.

However, HTTP is insecure. An attacker can grab that connection, manipulate it and only the most eagle eyed users might notice that it redirected to https://www.bank0famerica.com or some such. From then on, the user is under the control of the attacker, who can intercept passwords etc at will.

An STS enabled server can include the following header in an HTTPS reply:

Strict-Transport-Security: max-age=16070400; includeSubDomains

When the browser sees this, it will remember, for the given number of seconds, that the current domain should only be contacted over HTTPS. In the future, if the user types http:// or omits the scheme, HTTPS is the default. In fact, all requests for URLs in the current domain will be redirected to HTTPS. (So you have to make sure that you can serve them all!).

For more details, see the specification.

There is still a window where a user who has a fresh install, or who wipes out their local state, is vulnerable. Because of that, we'll be starting a "Preloaded STS" list. These domains will be configured for STS out of the box. In the beginning, this will be hardcoded into the binary. As it (hopefully) grows, it can change into a list this is shared across browsers, like the safe-browsing database is today.

If you own a site that you would like to see included in the preloaded STS list, contact me at .

Setting up Apache with OCSP stapling

OCSP is the Online Certificate Status Protocol. It's a way for TLS clients to check if a certificate is expired. A certificate with OCSP enabled includes a URL to which a client can send a POST request and receive a signed statement that a given certificate is still valid.

This adds quite a bit of latency to the TLS connection setup as the client has to perform a DNS lookup on the OCSP server hostname, create an HTTP connection and perform the request-response transaction.

OCSP stapling allows the TLS server to include a recent OCSP response in the TLS handshake so that the client doesn't have to perform its own check. This also reduces load on the OCSP server.

Apache recently got support for OCSP stapling and this post details how to set it up.

1. Prerequisites

Apache support got added in this revision. At the time of writing, no release of Apache includes this so we get it from SVN below.

OpenSSL support was added in 0.9.8h. The version in Ubuntu Karmic is not recent enough, so I pulled the packages from lucid for this:

cd /tmp
wget 'http://mirrors.kernel.org/ubuntu/pool/main/o/openssl/openssl_0.9.8k-7ubuntu3_amd64.deb'
wget 'http://mirrors.kernel.org/ubuntu/pool/main/o/openssl/libssl-dev_0.9.8k-7ubuntu3_amd64.deb'
wget 'http://mirrors.kernel.org/ubuntu/pool/main/o/openssl/libssl0.9.8_0.9.8k-7ubuntu3_amd64.deb'

sudo dpkg -i openssl_0.9.8k-7ubuntu3_amd64.deb libssl0.9.8_0.9.8k-7ubuntu3_amd64.deb  libssl-dev_0.9.8k-7ubuntu3_amd64.deb

2. Building Apache

As noted, we need the SVN version of Apache at the time of writing. I'll be using some paths in the following that you should change (like /home/agl/local/ocsp):

svn checkout http://svn.apache.org/repos/asf/httpd/httpd/trunk httpd
cd httpd
svn co http://svn.apache.org/repos/asf/apr/apr/trunk srclib/apr
./buildconf
cd srclib/apr
./configure --prefix=/home/agl/local/ocsp

At this point, I had to patch APR in order to get it to build. I suspect this build break will be fixed in short order but, for the sake of completeness, here's the patch that I applied:

--- poll/unix/pollset.c (revision 892677)
+++ poll/unix/pollset.c (working copy)
@@ -129,6 +129,8 @@
 
 static apr_status_t close_wakeup_pipe(apr_pollset_t *pollset)
 {
+    apr_status_t rv0, rv1;
+
     /* Close both sides of the wakeup pipe */
     if (pollset->wakeup_pipe[0]) {
     rv0 = apr_file_close(pollset->wakeup_pipe[0]);

Now we can build and install Apache itself. Since we are giving a prefix option, this doesn't conflict with any system installs.

cd ../..
./configure --prefix=/home/agl/local/ocsp --with-apr=/home/agl/local/ocsp --enable-ssl --enable-socache-dbm

Again, for the sake of completeness, I'll mention that Apache SVN had a bug at the time of writing that will stop OCSP stapling from working:

--- modules/ssl/ssl_util_stapling.c     (revision 892677)
+++ modules/ssl/ssl_util_stapling.c     (working copy)
@@ -414,6 +414,10 @@
        goto done;
    }

+    if (uri.port == 0) {
+        uri.port = APR_URI_HTTP_DEFAULT_PORT;
+    }
+
    *prsp = modssl_dispatch_ocsp_request(&uri, mctx->stapling_responder_timeout,
                                         req, conn, vpool);

Then build and install it...

make -j4
make install

3. Generating certs

For this example, I'll be generating a CA cert, a server cert and an OCSP responder cert for that CA. In the real world you'll probably be getting the certs from a true CA, so you can skip this step.

cd /home/agl/local/ocsp
mkdir certs && cd certs
wget 'https://fedorahosted.org/pkinit-nss/browser/doc/openssl/make-certs.sh?format=txt'
mv make-certs.sh\?format=txt make-certs.sh
/bin/bash ./make-certs.sh europa.sfo.corp.google.com test@example.com all ocsp:http://europa.sfo.corp.google.com/
cat ocsp.crt ocsp.key > ocsp.pem

Now I'm going to add the CA that was just generated to the CA set. Firstly, Chromium uses an NSS database in your home directory:

certutil -d sql:/home/agl/.pki/nssdb -A -n testCA -i ~/local/ocsp/certs/ca.crt -t Cu,,

OpenSSL uses a file of PEM certs:

cd /etc/ssl
cp cert.pem cert.pem.orig
rm cert.pem
cat cert.pem.orig /home/agl/local/ocsp/certs/ca.crt > cert.pem

4. Running the responder

In the real world, the OCSP responder is run by the CA that you got your certificate from. But here I'm going to be running my own since I generated a new CA in section 3.

cd ~/local/ocsp
touch index.txt
sudo openssl ocsp -index index.txt -port 80 -rsigner certs/ca.pem -CA certs/ca.pem

5. Configuring Apache

I won't cover the basics of configuring Apache here. There are plenty of documents on the web about that. I'll just note that I have Apache only listening on port 443 since my OCSP responder is running on 80.

The config that you'll need is roughly this:

SSLStaplingCache dbm:/tmp/staples
SSLCACertificateFile "/etc/ssl/cert.pem"
SSLUseStapling on

(You probably want to choose a better location for the cache.)

Apache will parse its server certificate on startup and extract the OCSP responder URL. It needs to find the CA certificate in order to validate OCSP responces and that's why the SSLCACertificateFile directive is there (and why we added the CA to that file in section 3).

After restarting Apache, look in the error.log. What you don't want to see is the following:

[Sun Dec 20 17:24:28 2009] [error] ssl_stapling_init_cert: Can't retrieve issuer certificate!
[Sun Dec 20 17:24:28 2009] [error] Unable to configure server certificate for stapling

That means that Apache couldn't find the CA cert.

There are other directives, but they are currently undocumented. Your best bet is to look at the original Apache bug and the commit itself.

Chrome Linux Beta

Life goals: get a comic published. Check.

Digital Economy Bill

Cory Doctorow seems to have crafted the lexicon of the opposition to the Digital Economy Bill with his phrase ‘Pirate Finder General’ [1]. In his follow up he claims that this bill will introduce three-strikes, ISP spying and powers for Peter Mandelson to rewrite copyright law at will.

I spent a fun-filled Sunday afternoon reading the bill to see how bad it really is so that you don't have to (although you still should). You're welcome.

Firstly, the changes to copyright law are only a small part of the bill. Other parts of the bill cover: Channel 4 and Channel 3 licensing, removing the requirements for Teletext, digital switch over, radio licenses and classification of video games. I won't be talking about those in this post.

The bill requires that ISPs pass on infringement notices to subscribers, either via email or via the postal system. Copyright owners can also request infringement lists. This allows them to see that their notices A, B, C all went to the same subscriber. They can then take this information to court and proceed with the usual actions. (124A and 124B.)

The bill defers much of the policy in this area to a code that is to be written by OFCOM with the approval of the Secretary of State. This code includes the number of infringement notices that a single subscriber needs in order to be included in a report, the size of fines to be imposed on ISPs for failing to follow the code and the appeals process. The Secretary of State sets the compensation paid from copyright holders to ISPs and from everyone to OFCOM (124L).

What isn't in the bill is any talk of disconnection, ISP spying or three strikes. However, 124H gives the Secretary of State the power to require any “technical obligation”.

(3) A "technical measure" is a measure that (a) limits the speed or other capacity of the service provided to a subscriber; (b) prevents a subscriber from using the service to gain access to particular material, or limits such use; (c) suspends the service provided to a subscriber; or (d) limits the service provided to a subscriber in another way.

A brief interlude about statutory instruments is needed here. SIs are published by the government and cannot be amended by Parliament. There are two types. The first takes effect automatically and Parliament has a short time (usually 40 days depending on holidays etc) to annul it. The second requires positive action by both Houses before it takes effect.

According to Wikipedia, the last time that an SI was annulled was in 2000, and 1979 before that. The last time that an SI wasn't approved was 40 years ago.

The powers to impose technical measures are of the annul variety: they take effect automatically after 40 days. They don't appear to include requiring ISPs to spy.

After that power, 302A gives the Secretary of State the power to change the Copyright Act via a positive action SI “for the purpose of preventing or reducing the infringement of copyright by means of the internet”.

Next up, 124N gives the Secretary of State the power to take over any domain name registrar. By my reading, that is no exaggeration.

The Secretary can take action if they believe that the actions of a registrar affect “(a) the reputation or availability of electronic communications networks or electronic communications services [...] (b) the interests of consumers or members of the public [...].”. The action can either be to appoint a “manager”, who has total power, or to ask a court to change the constitution of the body and enjoin them from changing it back.

There's no requirement that this registrar be based in the UK, or even to allocate in the uk ccTLD.

Understandably, they are quite upset about this.

I'd also like to quickly note that this bill contains provisions for the licensing of orphan works without the copyright holder being involved also for libraries to have the rights to lend out e-books. (Although without giving any powers to do anything about technical limitations on e-books that might prevent it.)

Lastly, and I might be misunderstanding something here, on page 52 the Secretary of State seems to get powers to amend or annul anything relating to this act, in any bill in this session of Parliament, or before, by using a positive action SI.

Hopefully the above provides pointers for people who want to understand and read the bill. Now, my (informed) opining:

This is an abomination. It's an attempt to subvert Parliament by giving the government the power to write copyright law at will. The provisions are extraordinary. The sanctions against domain name registrars are staggering. I don't know of any other case when the government can sequestrate a private entity at will. Given the international nature of the domain name system, this should cause international concern.

I expect that this power is largely intended to be a very large stick with which to force the removal of xyzsucks.com style names that embarrass business or government. No registrar will dare cross their interests if they have this power.

If you vote in the UK, goto TheyWorkForYou, lookup your MP, write a letter (on paper). Sign the petition. Support ORG.

Recent changes to SSL/TLS on the web

Most of the movement around TLS (aka SSL) currently involves people dealing with the renegotiation issues, but I'm going to sound a happier note today. TLS isn't static; things are changing for the better:

Strict transport security

My colleagues, Dr Barth and Collin Jackson proposed ForceHTTPS some time ago. This has picked up Jeff Hodges, from PayPal, and morphed into Strict Transport Security. Dr Barth and I have implemented this in Chromium and Firefox supports it with the NoScript extension.

In short, you can add a header to your HTTPS replies like: Strict-Transport-Security: max-age=86400 and the browser will remember, for the next 86400 seconds (1 day), that the origin host should only be contacted over HTTPS. It also forbids mixed content.

(Update: Dr Barth points out that the limits on mixed content have been removed as the standard has advanced!)

Chrome dev channel releases already support this and it'll be in Chrome 4.0. The hosts are stored in a JSON file in the profile directory:

{
   "+7cOz6FDyMiPEjNtc0haTPwdZPbvbPFP2NyZIA82GTM=": {
      "expiry": 1258514505.715938,
      "include_subdomains": false
   }
}

If you try to navigate to an http:// URL when that host has STS enabled, the browser will internally rewrite it to https://. Suitable sites (banks etc) should start using this as soon as possible.

Compression

Well, this certainly isn't new! OpenSSL has supported deflate compression on TLS connections for ages, but NSS (the SSL/TLS library used in all Mozilla based products for one) hasn't. This means that Firefox never supported compression, nor Thunderbird (and it's a fairly big deal for IMAP connections).

However, Wan Teh Chang and I have added deflate support to NSS and it'll be in next release. Thanks to Nelson Bolyard for the code review.

Cut through

Here's a diagram of a TLS connection from the RFC:

Client                                               Server

      ClientHello                  -------->
                                                      ServerHello
                                                     Certificate*
                                               ServerKeyExchange*
                                              CertificateRequest*
                                   <--------      ServerHelloDone
      Certificate*
      ClientKeyExchange
      CertificateVerify*
      [ChangeCipherSpec]
      Finished                     -------->
                                               [ChangeCipherSpec]
                                   <--------             Finished
      Application Data             <------->     Application Data

This means that an HTTPS connection adds an extra two round trips on top of HTTP.

Nagendra Modadugu and myself (independently) came up with a “cut through” mode for TLS handshakes. Rather than wait for the server's Finished message, the client can send application data after only one round trip. This means than an attacker can perform a downgrade attack on the cipher and force the client to transmit with a weaker cipher than it might have normally used. However, an attacker cannot get the key so, as long as all the supported ciphers are strong enough, it all works out.

This cuts a round-trip time from a normal HTTPS handshake and should be appearing in Chromium and Android soon.

(Nelson Bolyard tells me that this isn't a novel idea, although it doesn't seem to have had much traction up til now.)

Next protocol negotiation

TLS over port 443 is the only clean channel that many hosts have these days. However, this means that the TCP destination port number can no longer be used to select an application level protocol since it's fixed by firewalls, proxies etc.

The specific use case for this would be SDPY, a new transport layer for HTTP. We want to know, before we send the first request, if the server supports SDPY.

draft-agl-tls-nextprotoneg describes an extension to let you do that. It's being tested in Chromium at the moment (although not yet in the public tree).

Go launch

I'm delighted to be a minor part of the Go launch today:

Go is an experimental language from Google that I've been coding in for the past month or so. It sits in a similar niche to Java: performant but garbage collected. However, it's vastly more enjoyable to code in than Java!

Thanks to a suite of compilers, it compiles to machine code very quickly. There's also a frontend to GCC in the works. It's runtime and type safe, concurrent, has a novel (for me, at least) take on object orientation and provides runtime reflections on types.

Personally, I think it gets a place in my list of favoured tools, which currently contains C, C++, Python and Haskell.

The TLS flaw that wasn't

There were many articles yesterday suggesting that a major new flaw in TLS (aka SSL) had been found ([1][2][3]). The last of those is a post by Ben Laurie, an expert in these matters, with a suitably hyperbolic title: “Another Protocol Bites The Dust”

Here's the issue: there's an extremely uncommon configuration of web servers where they're setup to require client side certificates for some URLs and not others. If a user has an HTTPS connection open that didn't handshake with a client side certificate and they try to access such a URL, the webserver will perform another handshake on the same connection. As soon as that handshake completes with the correct certificate, they'll run the request that was received from before the connection was fully authenticated.

It's a bug in the web server. There was a misunderstanding between what the folks writing the webserver thought that TLS was providing and what it actually provides. One might also argue that it's a short coming in the HTTP protocol (there's no way for a server to ask a client to redo a request). One might also argue that TLS should provide the properties that the web servers expected.

But it's not a flaw in TLS. The TLS security properties are exactly what was intended.

Now, it appears that the fix will be to TLS. That's fine, but the place that gets ‘fixed’ isn't always the place that made the mistake.

I don't understand why knowledgeable folks like EKR and Laurie are so eager to attribute this problem to TLS.

Anti aliased clipping, a tale of woe

People have been complaining that rounded rectangles in Chrome aren't anti-aliased. If you're a web developer, it seems that this is a Big Deal.

The issue is that almost anything can have rounded corners in WebKit. There's not a drawRoundedRectangle function, instead, clipping paths are created and then normal drawing proceeds. On Safari (which is also WebKit, but sitting on top of the CoreGraphics library), clipping to a path is anti-aliased and everything looks pretty. However, Chrome's graphics library, Skia, doesn't do anti-aliased clipping for a good reason.

Consider the figure below:

At the top left is an anti-aliased clipping region. The darker the pixel, the more is covered by the path. If we were to fill the region with green, we would get the image at the bottom left. When drawing, we consider how much of the clipping region covers each pixel and convert that to an alpha value. For a pixel which was half covered by the clipping region we would calculate 50% × background_color + 50% × green.

However, consider what happens when we first fill with red (top right) and then with green (bottom right). We would expect that the result would be the same as filling with green - the second fill should cover the first. But for pixels which are fractionally covered by clipping region, this isn't the case.

The first fill, with red, works correctly as detailed above. But when we come to do the second fill, the background_color isn't the original background color, but the slightly red color resulting from the first fill. Both CoreGraphics and Firefox's <canvas> have this bug.

It might seem trivial, but if you end up covering anti-aliased clipping regions multiple times you end up with unsightly borders around the clip paths. This is why Skia only supports 1-bit clip paths.

The correct way to do anti-aliased clipping is to draw to a layer, on top of the original bitmap, and, when the clipping path is popped from the clip stack, erase outside of the path (anti-aliased) and composite the result onto the underlying bitmap.

This works just fine, the problem is that <canvas> users don't always pop the clip stack. They expect to be able to set a clipping path, draw and have it appear without managing a stack. We could collapse the clipping stack for them when we paint to the screen, but then we need to restore it afterwards, which would require major surgery to Skia.

The second problem with anti-aliasing, even when done correctly, is that it makes it impossible to put polygons next to each other. Try this demo in Firefox and note the hairlines caused by anti-aliasing.

I think what Chrome will end up doing is to anti-alias the clipping paths (correctly) for everything except <canvas>. This isn't a great solution, but it's better than what we have now.

Chromium's seccomp Sandbox

I wrote an article for LWN about Chromium's seccomp sandbox. They decided that it wasn't in the right style for LWN, and they rewrote it to fit. Their version has just become available for free. I'm including my version below:

The Chromium seccomp sandbox

As part of the process of porting Chromium to Linux, we had to decide how to implement Chromium's sandbox on Linux.

The Chromium sandbox is an important part of keeping users safe. The web is a very complicated place these days and the code to parse and interpret it is large and on the front-line of security. We try to make sure that this code is free of security bugs, but history suggests that we can't be perfect. So, we plan for the case where someone has an exploit against our rendering code and run it in its own process with limited authority. It's the sandbox's job to limit that authority as much as possible.

Chromium renderers need very little authority. They need access to fontconfig to find fonts on the system and to open those font files. However, these can be handled as IPC requests to the browser process. They do not need access to the X server (which is why we don't have GTK widgets on web pages), nor should they be able to access DBus, which is increasingly powerful these days.

Drawing is handled using SysV shared memory (so that we can share memory directly with X). Everything else is either serialised over a socketpair or passed using a file descriptor to a tmpfs file. This means that we can deny filesystem access completely. The renderer requires no network access: the network stack is entirely within the browser process.

Traditional sandboxing schemes on Linux involve switching UIDs and using chroot. We'll be using some of those techniques too. But this text is about the most experimental part of our sandbox: the seccomp layer which my colleague Markus Gutschke has been writing.

The kernel provides a little known feature where by any process can enter ‘seccomp mode’. Once enabled it cannot be disabled. Any process running in seccomp mode can only make four system calls: read, write, sigreturn and exit. Attempting any other system call will result in the immediate termination of the process.

This is quite desirable for preventing attacks. It removes network access, which is traditionally difficult to limit otherwise (although CLONE_NEWNET is might help here). It also limits access to new, possibly dangerous, system calls that we don't otherwise need like tee and vmsplice. Also, because read and write proceed at full speed, if we limit our use of other system calls, we can hope to have a minimal performance overhead.

But we do need to support some other system calls. Allocating memory is certainly very useful. The traditional way to support this would be to RPC to a trusted helper process which could validate and perform the needed actions. However, a different process cannot allocate memory on our behalf. In order to affect the address space of the sandboxed code, the trusted code would have to be inside the process!

So that's what we do: each untrusted thread has a trusted helper thread running in the same process. This certainly presents a fairly hostile environment for the trusted code to run in. For one, it can only trust its CPU registers - all memory must be assumed to be hostile. Since C code will spill to the stack when needed and may pass arguments on the stack, all the code for the trusted thread has to carefully written in assembly.

The trusted thread can receive requests to make system calls from the untrusted thread over a socket pair, validate the system call number and perform them on its behalf. We can stop the untrusted thread from breaking out by only using CPU registers and by refusing to let the untrusted code manipulate the VM in unsafe ways with mmap, mprotect etc.

That could work, if only the untrusted code would make RPCs rather than system calls. Our renderer code is very large however. We couldn't patch every call site and, even if we could, our upstream libraries don't want those patches. Alternatively, we could try and intercept at dynamic linking time, assuming that all the system calls are via glibc. Even if that were true, glibc's functions make system calls directly, so we would have to patch at the level of functions like printf rather than write.

This would seem to be a very tough problem, but keep in mind that if we miss a call site, it's not a security issue: the kernel will kill us. It's just a crash bug. So we could use a theoretically incorrect solution so long as it actually worked in practice. And this is what we do:

At startup we haven't processed any untrusted input, so we assume that the program is uncompromised. Now we can disassemble our own memory, find sites where we make system calls and patch them. Correctly parsing x86 machine code is very tough. Native Client uses a customised compiler which only generates a subset of x86 in order to do it. But we don't need a perfect disassembler so long as it works in practice for the code that we have. It turns out that a simple disassembler does the job perfectly well with only a very few corner cases.

Now that we have patched all the call sites to call our RPC wrapper, instead of the kernel, we are almost done. We have only to consider system calls which pass arguments in memory. Because the untrusted code can modify any memory that the trusted code can, the trusted code couldn't validate calls like open. It could verify the filename being requested but the untrusted code could change the filename before the kernel copied the string from user-space.

For these cases, we also have a single trusted process. This trusted process shares a couple of pages of memory with each of the trusted threads. When the trusted thread is asked to make a system call which it cannot safely validate, it forwards the call to the trusted process. Since the trusted process has a different address space, it can safely validate the arguments without interference. It then copies the validated arguments into the shared memory pages. These memory pages are writable by the trusted process, but read-only in the sandboxed process. Thus the untrusted code cannot modify them and the trusted code can safely make the system call using the validated, read-only arguments.

We also use this trick for system calls like mmap which don't take arguments in memory, but are complicated to verify. Recall that the trusted thread has to be hand written in assembly so we try to minimise the amount of this code where possible.

Once we have this scheme in place we can intercept, examine and deny any system calls. We start off denying everything and then, slowly, add system calls that we need. For each system call we need to consider the security implications it might have. Calls like getpid are easy, but what damage could one do with mmap/munmap? Well, the untrusted code could replace the code which the trusted threads are running for one! So, when a call might be dangerous we allow only a minimal, and carefully examimed, subset of flags which match the uses that we actually have in our code.

We'll be layering this sandbox with some more traditional UNIX sandboxing techniques in the final design. However, you can get a preview of the code in it's incomplete state already at its Google Code homepage.

There's still much work to be done. A given renderer could load a web page with an iframe to any domain. Those iframes are handled in the same renderer, thus a compromised renderer can ask the browser for any of the user's cookies. Microsoft research developed Gazelle, which has much stricter controls on a renderer, at the expense of web-compatibility. We know that users wont accept browsers that don't work with their favourite websites, but we are also very jealous of Gazelle's security properties so hopefully we can improve Chromium along those lines in the future.

Another weak spot are installed plugins. Plugin support on Linux is very new but on Windows, at least, we don't sandbox plugins. They don't expect to be sandboxed and we hurt web-compatibility (and break their auto-updating) if we limit them. That means that plugins are a vector for more serious attacks against web browsers. As ever, keep up to date with the latest security patches!

DNSCurve Internet Draft

Matthew has posted a Internet draft for DNSCurve. DNSCurve is a way of securing DNS which isn't DNSSEC. See Dan's talk from a couple of weeks ago about why DNSCurve is the better answer.

DEFCON 17

I'll be going to DEFCON this year. Ping me if you'll be around.

SELinux from the inside out

There are some great sources of information for users and sysadmins about SELinux [1] [2] but your author has always preferred to understand a system from the bottom-up and, in this regard, found the information somewhat lacking. This document is a guide to the internals of SELinux by starting at the kernel source and working outwards.

We'll be drawing on three different sources in order to write this document.

Access vectors

SELinux is fundamentally about answering questions of the form “May x do y to z?” and enforcing the result. Although the nature of the subject and object can be complex, they all boil down to security identifiers (SIDs), which are unsigned 32-bit integers.

The action boils down to a class and a permission. Each class can have up to 32 permissions (because they are stored as a bitmask in a 32-bit int). Examples of classes are FILE, TCP_SOCKET and X_EVENT. For the FILE class, some examples of permissions are READ, WRITE, LOCK etc.

At the time of writing there are 73 different classes (selinux/libselinux/include/selinux/flask.h) and 1025 different permissions (.../av_permissions.h).

The security policy of a system can be thought of as a table, with subjects running down the left edge, objects across the top and, in each cell, the set of actions which that subject can perform on that object.

This is reflected in the first part of the SELinux code that we'll look at : the access vector cache (security/selinux/avc.c). The AVC is a hash map from (subject, object, class) to the bitset of permissions allowed:

struct avc_entry {
        u32                     ssid;    // subject SID
        u32                     tsid;    // object SID
        u16                     tclass;  // class
        struct av_decision      avd;     // contains the set of permissions for that class
};

The AVC is queried when the kernel needs to make security decisions. SELinux hooks into the kernel using the LSM hooks and is called whenever the kernel is about to perform an action which needs a security check. Consider the getpgid system call to get the current process group ID. When SELinux is built into a kernel, this ends up calling the following hook function (security/selinux/hooks.c):

static int selinux_task_getpgid(struct task_struct *p)
{
        return current_has_perm(p, PROCESS__GETPGID);
}

static int current_has_perm(const struct task_struct *tsk,
                            u32 perms)
{
        u32 sid, tsid;

        sid = current_sid();
        tsid = task_sid(tsk);
        return avc_has_perm(sid, tsid, SECCLASS_PROCESS, perms, NULL);
}

Referring back to the table concept: in order to check if a process with SID x may call getpgid we find x across and x down and check that SECCLASS_PROCESS:PROCESS__GETPID is in the set of allowed actions.

So now we have to discover what the AVC is actually caching, and where these SIDs are coming from. We'll tackle the latter question first.

SIDs and Security Contexts

SIDs turn out to be much like interned symbols in some languages. Rather than keeping track of complex objects and spending time comparing them during lookups, they are reduced to an identifier via a table. SIDs are the interned identifiers of security contexts. The sidtab maps from one to the other (security/selinux/ss/sidtab.h):

struct sidtab {
        struct sidtab_node **htable;
        unsigned int nel;       /* number of elements */
        unsigned int next_sid;  /* next SID to allocate */
        unsigned char shutdown;
        spinlock_t lock;
};

struct sidtab_node {
        u32 sid;                       /* security identifier */
        struct context context;        /* security context structure */
        struct sidtab_node *next;
};

The SID table is optimised for mapping from SIDs to security contexts. Mapping the other way involves walking the whole hash table.

The structure for the security context is probably familiar to you if you have worked with SELinux before (security/selinux/ss/context.h):

struct context {
        u32 user;
        u32 role;
        u32 type;
        u32 len;        /* length of string in bytes */
        struct mls_range range;
        char *str;        /* string representation if context cannot be mapped. */
};

If you have an SELinux enabled system, you can look at your current security context with id -Z. Running that will produce something like unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023. This string splits into four parts:

  1. The SELinux “user”: unconfined_u
  2. The role: unconfined_r
  3. The type: unconfined_t (we'll mostly be concentrating on types)
  4. The multi-level-security (MLS) sensitivity and compartments: s0-s0:c0.c1023

(You might notice that the parts are broken up with colons, but that the MLS part can contain colons too! Obviously, this is the only part that can contain colons to avoid ambiguity.)

When the system's security policy is compiled, these names are mapped to IDs. It's these IDs which end up in the kernel's context structure. Also notice that, by convention, types end in _t, roles with _r and users with _u. Don't confuse UNIX users with SELinux users; they are separate namespaces. For a sense of scale, on a Fedora 11 box, the default policy includes 8 users, 11 roles and 2727 types.

The Security Server

We now address the question of what it is that the access vector cache is actually caching. When a question is asked of the AVC to which it doesn't have an answer, it falls back on the security server. The security server is responsible for interpreting the policy from userspace. The code lives in context_struct_compute_av (in security/selinux/ss/services.c). We'll walk through its logic (and we'll expand on each of these points below):

  1. The subject and object's type are used to index type_attr_map, which results in a set of types for each of them.
  2. We consider the Cartesian product of the two sets and build up a 32-bit allowed bit-vector based on the union of the permissions in the access vector table for each (subject, object) pair.
  3. For each pair in the product, we also include the union of permissions from a second access vector table: the conditional access vector table.
  4. The target type is used to index an array and from that we get a linked list of “constraints”. Each constraint contains byte code for a stack based virtual machine and can limit the granted permissions.
  5. If the resulting set of permissions includes role transition, then we walk a linked list of allowed role transitions. If the transition isn't whitelisted, those permissions are removed.
  6. If either the subject or object's type is ‘bounded’, then we recurse and check the permissions of the bounded types. We verify that the resulting permissions are a subset of the permissions enjoyed by types that they are bounded by. This should be statically enforced by the tool with produced the policy so, if we find a violation, it's logged and the resulting permissions are clipped.

Now, dealing with each of those steps in more detail:

Type attributes

Type attributes are discussed in the Configuring the SELinux Policy report. They are used for grouping types together: by including a type attribute on a new type, the new type inherits all the permissions granted to the type attribute. As can be seen from the description above, type attributes are implemented as types themselves.

These attributes could have been statically expanded by the tool which generated the policy file. Expanding at generation time is a time/space tradeoff and the SELinux developers opted for the smaller policy file.

It's also worth noting that type_attr_map isn't expanded recursively: one can only have one level of type attributes.

Type attributes conventionally end in _type (as opposed to types, which end in _t). In the Fedora 11 policy, here are the top five type attributes:

Name of type attribute Number of types with that attribute
file_type 1406
non_security_file_type 1401
exec_type 484
entry_type 478
domain 442

The graph of types and type attributes is, as expected, bipartite.

The conditional access vector table

The conditional access vector table contains permissions just like the regular access vector table except that each, optionally, has an extra flag: AV_ENABLED (security/selinux/avtab.h). This flag can be enabled and disabled at run time by changing the value of ‘booleans’. These booleans are quite well covered by the higher-level documentation for the policy language (here and here).

The set of booleans can be found in /selinux/booleans (if you are running SELinux). They can be read without special authority although you should be aware of a bug: trying to read more than a page from one of those files results in -EINVAL and recent coreutils binaries (like cat) use a buffer size of 32K. Instead you can use dd, or just run the friendly tool: semanage boolean -l.

The AV_ENABLED flag is updated when a boolean is changed. The conditional access vector table is populated by a list of cond_node structures (security/selinux/conditional.h). These contain a bytecode for a limited, stack based machine and and two lists of access vectors which should be enabled or disabled in the case that the machine returns true or false.

The stack machine can read any of the configured booleans and combine them with standard boolean algebra, returning a single bit result.

Constraints

One of the parts of the SELinux policy language is the ability to define constraints. Constraints are defined using the neverallow command. Constraints are used to prevent people from writing bad policy, or in the case of MLS, to enforce rules governing information flow. http://danwalsh.livejournal.com/12333.html

As you can see if you read the above linked blog post, constraints are statically enforced by the policy tools where possible and also checked by the kernel. Constraints are evaluated by running a stack-machine bytecode. (This is a different machine than that which is used for the conditional access vector table.) Based on the kernel code for the stack-machine, we can write a simple disassembler and see what constraints are enforced in the kernel.

In the Fedora 11 policy, 32 classes have constraints applied to them. Let's have a look at some of them. Here's the first one:

constraint for class 'process' permissions:800000:
  DYNTRANSITION
subject.user == object.user?
subject.role == object.role?
and

Roughly translated, this means “Whenever operating on an object of class process, the DYNTRANSITION permission is forbidden unless the user and role of the subject and object match”. A DYNTRANSITION (dynamic transition) is when a process switches security contexts without execing a binary. Think of it like a setuid call for security contexts (we'll cover how to perform this later).

Here's another constraint, a longer one this time:

constraint for class 'file' permissions:188:
  CREATE
  RELABELFROM
  RELABELTO
subject.user == object.user?
[bootloader_t, devicekit_power_t, logrotate_t, ldconfig_t, unconfined_cronjob_t, unconfined_sendmail_t, setfiles_mac_t,
initrc_t, sysadm_t, ada_t, fsadm_t, kudzu_t, lvm_t, mdadm_t, mono_t, rpm_t, wine_t, xdm_t, unconfined_mount_t,
oddjob_mkhomedir_t, saslauthd_t, krb5kdc_t, newrole_t, prelink_t, anaconda_t, local_login_t, rpm_script_t,
sysadm_passwd_t, system_cronjob_t, tmpreaper_t, samba_unconfined_net_t, unconfined_notrans_t, unconfined_execmem_t,
devicekit_disk_t, firstboot_t, samba_unconfined_script_t, unconfined_java_t, unconfined_mono_t,
httpd_unconfined_script_t, groupadd_t, depmod_t, insmod_t, kernel_t, kpropd_t, livecd_t, oddjob_t, passwd_t, apmd_t,
chfn_t, clvmd_t, crond_t, ftpd_t, inetd_t, init_t, rshd_t, sshd_t, staff_t, udev_t, virtd_t, xend_t, devicekit_t,
remote_login_t, inetd_child_t, qemu_unconfined_t, restorecond_t, setfiles_t, unconfined_t, kadmind_t,
ricci_modcluster_t, rlogind_t, sulogin_t, yppasswdd_t, telnetd_t, useradd_t, xserver_t] contains subject.type?
or

This means that when you create a file or change its security context, either the SELinux user of the file has to match your current SELinux user, or you have to be one of a list of privileged types.

One last example foreshadows several large subjects: user-land object managers and multi-level security. For now I'll leave it undiscussed to wet your appetite.

constraint for class 'db_database' permissions:7de:
  DROP
  GETATTR
  SETATTR
  RELABELFROM
  ACCESS
  INSTALL_MODULE
  LOAD_MODULE
  GET_PARAM
  SET_PARAM
object.sensitivity[high] dominates type?

Roles and users

In step 5, above, we mention ‘role transitions’, so we should probably discuss SELinux users and roles. Keep in mind that SELinux users are separate from normal UNIX users.

Each type inhabits some set of roles and each role inhabits some set of SELinux users. UNIX users are mapped to SELinux users at login time (run `semanage login -l`) and so each user has some set of roles that they may operate under. Like the standard custom of administrating a system by logging in as a normal user and using sudo only for the tasks which need root privilege, roles are designed for the same purpose. Although a given physical user may need to perform administrative tasks, they probably don't want to have that power all the time. If they did, then there would be a confused deputy problem when they perform what should be an unprivileged task which does far more than intended because they performed it with excess authority.

Here's the graph of users and roles in the Fedora 11 targeted policy:

An SELinux user can move between roles with the newrole command, if such a role transition is permitted. Here's the graph of permitted role transitions in the Fedora policy:

With the targeted policy at least, roles and users play a relatively small part in SELinux and we won't cover them again.

Bounded types

A type in SELinux may be “bounded” to another type. This means that the bounded type's permissions are a strict subset of the parent and here we find the beginnings of a type hierarchy. The code for enforcing this originally existed only in the user-space tools which build the policy, but recently it was directly integrated into the kernel.

In the future, this will make it possible for a lesser privileged process to safely carve out subsets of policy underneath the administratively-defined policy. At the time of writing, this functionality has yet to be integrated in any shipping distribution.

(Thanks to Stephen Smalley for clearing up this section.)

The SELinux filesystem

The kernel mostly communicates with userspace via filesystems. There's both the SELinux filesystem (usually mounted at /selinux) and the standard proc filesystem. Here we'll run down some of the various SELinux specific entries in each.

But first, a quick note. Several of the entries are described as ‘transaction’ files. This means that you must open them, perform a single write and then a single read to get the result. You must use the same file descriptor for both (so, no echo, cat pairs in shell scripts).

/selinux/enforcing

A boolean file which specifies if the system is in ‘enforcing’ mode. If so, SELinux permissions checks are enforced. Otherwise, they only cause audit messages.

(Read: unprivileged. Write: requires root, SECURITY:SETENFORCE and that the kernel be built with CONFIG_SECURITY_SELINUX_DEVELOP.)

/selinux/disable

A write only, boolean file which causes SELinux to be disabled. The LSM looks are reset, the SELinux filesystem is unregistered etc. SELinux can only be disabled once and probably doesn't leave your kernel in the best of states.

(Read: unsupported. Write: requires root, and that the kernel be built with CONFIG_SECURITY_SELINUX_DISABLE.)

/selinux/policyvers

A read only file which contains the version of the current policy. The version of a policy is contained in the binary policy file and the kernel contains logic to deal with older policy versions, should the version number in the file suggest that it's needed.

(Read: unprivileged. Write: unsupported.)

/selinux/load

A write only file which is used to load policies into the kernel. Loading a new policy triggers a global AVC invalidation.

(Read: unsupported. Write: requires root and SECURITY:LOAD_POLICY.)

/selinux/context

A transaction file. One writes a security context string and then reads the resulting, canonicalised context. The context is canonicalised by running it via the sidtab.

(Read/Write: unprivileged.)

/selinux/checkreqprot

A boolean file which determines which permissions are checked for mmap and mprotect calls. In certain cases the kernel can actually grant a process more access than it requests with these calls. (For example, if a shared library is marked as needing an executable stack, then the kernel may add the PROT_EXEC permission if the process didn't request it.)

If the value of this boolean is one, then SELinux checks the permissions requested by the process. If 0, it checks the permissions which the process will actually receive.

(Read: unprivileged. Write: requires root, and SECURITY:SETCHECKREQPROT.)

/selinux/access

A transaction file which allows a user-space process to query the access vector table. This is the basis of user-space object managers.

The write phase consists of a string of the following form: ${subject security context (string)} ${object security context (string)} ${class (uint16_t, base 10)} ${requested permissions (uint32_t bitmap, base 16)}.

The read phase results in a string with this format: ${allowed permissions (uint32_t bitmap, base 16)} 0xffffffff ${audit allow (uint32_t bitmap, base 16)} ${audit deny (uint32_t bitmap, base 16)} ${sequence number (uint32_t, base 10)} ${flags (uint32_t, base 16)}.

This call will be covered in greater detail in the User-space Object Managers section, below.

(Read/Write: SECURITY:COMPUTE_AV.)

Attribute files

SELinux is also responsible for a number of attribute files in /proc. The attribute system is actually a generic LSM hook, although the names of the nodes are current hardcoded into the code for the proc filesystem.

/proc/pid/attr/current

Contains the current security context for the process. Writing to this performs a dynamic transition to the new context. In order to do this:

(Read: PROCESS:GETATTR. Write: only allowed for the current process and requires PROCESS:SETCURRENT)

/proc/pid/attr/exec

Sets the security context for child processes. The permissions checking is done at exec time rather than when writing this file.

(Read: PROCESS:GETATTR. Write: only allowed for the current process and requires PROCESS:SETEXEC)

/proc/pid/attr/fscreate

Sets the security context for files created by the current process. The permissions checking is done at creat/open time rather than when writing this file.

(Read: PROCESS:GETATTR. Write: only allowed for the current process and requires PROCESS:SETFSCREATE)

/proc/pid/attr/keycreate

Sets the security context for keys created by the current process. Keys support in the kernel is documented in Documentation/keys.txt. The permissions checking is done at creation time rather than when writing this file.

(Read: PROCESS:GETATTR. Write: only allowed for the current process and requires PROCESS:SETKEYCREATE)

/proc/pid/attr/sockcreate

Sets the security context for sockets created by the current process. The permissions checking is done at creation time rather than when writing this file.

(Read: PROCESS:GETATTR. Write: only allowed for the current process and requires PROCESS:SETSOCKCREATE)

User-space object managers

Although the kernel is a large source of authority for many process, it's certainly not the only one these days. An increasing amount of ambient authority is being granted via new services like DBus and then there's always the venerable old X server which, by default, allows clients to screenshot other windows, grab the keyboard input etc.

The most common example of a user-space process attempting to enforce a security policy is probably a SQL servers. PostgreSQL and MySQL both have a login system and an internal user namespace, permissions database etc. This leads to administrators having to learn a whole separate security system, use password authentication over local sockets, include passwords embedded in CGI scripts etc.

User-space object managers are designed to solve this issue by allowing a single policy to express the allowed actions for objects which are managed outside the kernel. The NSA has published a number of papers about securing these types of systems: X: [1] [2], DBus: [1]. See also the SE-PostgreSQL project for details on PostgreSQL and Apache.

In order to implement such a design a user-space process needs to be able to label its objects, query the global policy and determine the security context of requests from clients. The libselinux library contains the canonical functions for doing all these things, but this document is about what lies under the hood, so we'll be doing it raw here.

The task of labeling objects is quite specific to each different object manager and this problem is discussed in the above referenced papers. Labels need to be stored (probably persistently) and administrators need some way to query and manipulate them. For example, in the X server, objects are often labeled with a type which derives from its name ("XInput" → "input_ext_t")

When it comes to querying the policy database, a process could either open the policy file from disk (which we'll cover later) or it could query the kernel. Querying the kernel solves a number of issues around locating the policy and invalidating caches when it gets reloaded, so that's the path which the SELinux folks have taken. See the section on /selinux/access for the interface for doing this.

In order to authenticate requests from clients, SELinux allows a process to get the security context of the other end of a local socket. There are efforts underway to extend this beyond the scope of a single computer, but I'm going to omit the details for brevity here.

I'll start with a code example of this authentication first:

  const int afd = socket(PF_UNIX, SOCK_STREAM, 0);
  assert(afd >= 0);

  struct sockaddr_un sun;
  memset(&sun, 0, sizeof(sun));
  sun.sun_family = AF_UNIX;
  strcpy(sun.sun_path, "test-socket");
  assert(bind(afd, (struct sockaddr*) &sun, sizeof(sun)) == 0);
  assert(listen(afd, 1) == 0);
  const int fd = accept(afd, NULL, NULL);
  assert(fd >= 0);

  char buf[256];
  socklen_t bufsize = sizeof(buf);
  assert(getsockopt(fd, SOL_SOCKET, SO_PEERSEC, buf, &bufsize) == 0);
  printf("%s\n", buf);

This code snippet will print the security context of any process which connects to it. Running it without any special configuration on a Fedora 11 system (targeted policy) will result in a context of unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023. Don't try running it on a socket pair however, you end up with system_u:object_r:unlabeled_t:s0.

If you already have code which is using SCM_CREDENTIALS to authenticate peers, you can use getpidcon to get a security context from a PID. Under the hood this just reads /proc/pid/attr/context.

Now that we can label requests, the next part of the puzzle is getting access decisions from the kernel. As hinted at above, the /selinux/access file allows this. See above for the details of the transaction format. As an example, we'll see if the action PROCESS:GETATTR, with a subject and object of unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023, is permitted.

  → unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 2 00010000
  ← f77fffff ffffffff 0 fffafb7f 11

This is telling us that it is permitted (the only bits missing are for EXECHEAP and DYNTRANSITION). It also tells us the permissions which should be logged on allow and deny and the sequence number of the policy state in the kernel. Note that, above, we documented an additional flags field, however it's missing in this example. That's another good reason to use libselinux! The flags field was only recently added and isn't in the kernel which I'm using for these examples.

At this time, the astute reader will be worried about the performance impact of getting this information from the kernel in such a manner. The solution is to use the same access vector cache code that the kernel uses, in user-space to cache the answers from the kernel. This is another benefit which libselinux brings.

However, every cache brings with it problems of consistency and this is no different. All user-space object managers need to know when the administrator updates the system security policy so that they can flush their AVCs. This notification is achieved via a netlink socket, as demonstrated by the following snippet:

  const int fd = socket(PF_NETLINK, SOCK_RAW, NETLINK_SELINUX);
  assert(fd >= 0);

  struct sockaddr_nl addr;
  int len = sizeof(addr);
  memset(&addr, 0, len);
  addr.nl_family = AF_NETLINK;
  addr.nl_groups = SELNL_GRP_AVC;
  assert(bind(fd, (struct sockaddr*) &ddr, len) == 0);

  struct sockaddr_nl nladdr;
  socklen_t nladdrlen;
  char buf[1024];
  struct nlmsghdr *nlh = (struct nlmsghdr *)buf;

  for (;;) {
    nladdrlen = sizeof(nladdr);
    const ssize_t r = recvfrom(fd, buf, sizeof(buf), 0,
                               (struct sockaddr*) &nladdr, &nladdrlen);
    assert(r >= 0);
    assert(nladdrlen == sizeof(nladdr));
    assert(nladdr.nl_pid == 0);
    assert((nlh->nlmsg_flags & MSG_TRUNC) == 0);
    assert(nlh->nlmsg_len <= r);

    if (nlh->nlmsg_type == SELNL_MSG_SETENFORCE) {
      struct selnl_msg_setenforce *msg = NLMSG_DATA(nlh);
      printf("enforcing %s\n", msg->val ? "on" : "off");
    } else if (nlh->nlmsg_type == SELNL_MSG_POLICYLOAD) {
      struct selnl_msg_policyload *msg = NLMSG_DATA(nlh);
      printf("policy loaded, seqno:%d\n", msg->seqno);
    }
  }

If you toggle the enforcing mode off and on, or reload the system policy (with `semodule -R`), a message is delivered to via the netlink socket. A user-space object manager can then flush its AVC etc.

With all the above, hopefully it's now clear how user-space object managers work. If you wish to write your own, remember to read the libselinux man pages first.

Reading binary policy files

The system security policy is written in a text-based language which has been well documented elsewhere. These text files are compiled and checked by user-space tools and converted into a binary blob that can be loaded into the kernel. The binary blob is also saved on disk and can be a useful source for information.

The SELinux user-space tools contain libsepol which is very useful for parsing these files. Here's a snippet of example code which returns the number of users, roles and types defined in a policy file:

#include <sepol/policydb.h>
#include <sepol/policydb/policydb.h>

int main(int argc, char **argv) {
  FILE* file = fopen(argv[1], "r");
  sepol_policy_file_t* input;
  sepol_policy_file_create(&input);
  sepol_policy_file_set_fp(input, file);

  sepol_policydb_t* policy;
  sepol_policydb_create(&policy);
  sepol_policydb_read(policy, input);

  printf("users:%d roles:%d types:%d\n",
         policy->p.p_users.nprim
         policy->p.p_roles.nprim
         policy->p.p_types.nprim);

  return 0;
};

By looking in the sepol/policydb/policydb.h header, you can probably find whatever you are looking for. Pay special heed to the comments about indexing however. Users, roles and types are indexed from 1 in some places and from 0 in others.

With a little C code, much of the useful information can be extracted from the policy files. The numbers and graphs above were generated this way, with a little help from a few Python scripts.

Conclusion

Hopefully we've covered some useful areas of SELinux that some people were unfamiliar with before, or at least shown the inner workings of something which you already knew about.

If you want information about the practical aspects of administering a system with SELinux, you should start with the Fedora documentation on the subject. After reading this document I hope that some of it is clearer now.

General homomorphic encryption

If you've heard of Hal Finney, the following quote should be enough to get you to read: his explanation of the recent homomorphic encryption paper:

This is IMO one of the most remarkable crypto papers ever. Not only does it solve one of the oldest open problems in cryptography, the construction of a fully homomorphic encryption system, it does so by means of a self-embedding technique reminiscent of Godel's theorem.

Linux sandboxing with LSMSB

Chrome Linux got a dev channel release and I'm very happy with it. It's now my primary browser.

However, one of the big selling points for Chrome on Windows is that the renderers (which deal with decoding the HTML, CSS, image files etc) are sandboxed. We've had exploitable issues in the renderers which which have been stopped by the sandbox. It's a Good Thing.

However, we don't have a sandbox on Linux! The Mac team have been talking about how nice their sandbox is (and I expect we'll get some official documentation about it after WWDC this week). We have to hack around with SUID binaries, chrooting, seccomp and one-size-fits all SELinux solutions.

(I don't wish to discount the good work that the SELinux folks have done: we'll probably use something like that sandbox on Fedora, but Chromium was very carefully written to be sandboxed and we should aim higher.)

So, as part of the exploration of what we could do with sandboxing on Linux, longer term, I have a prototype implementation of LSMSB. It's another literate program of mine, so you can usefully read the source too. The README is included below:

This is LSMSB, a sandboxing scheme for Linux based on the ideas of the OS X
sandbox (which, in turn, was inspired by TrustedBSD and FreeBSD).

Imagine that you're working on a university computer and you get a binary which
promises to do some fiendishly complex calculation, reading from a file ./input
and writing to a file ./output. It also talks to a specific server to access a
pre-computed lookup table. You want to run it, but you don't want to have to
trust that it won't do anything malicious (save giving the wrong answer).

This code is incomplete, but currently you can take a sandbox specification
like this:

filter dentry-open {
  constants {
    var etc-prefix bytestring = "/etc/";
  }

  ldc r2,etc-prefix;
  isprefixof r2,r2,r0;
  jc r2,#fail;
  ldi r0,1;
  ret r0;
#fail:
  ldi r0,0;
  ret r0;
}

... and use it to remove access to /etc.

*** This code functions, but is incomplete ***

It's written in a literate programming style, but the derived sources are
included so that you don't have to bother with that in order to build. You'll
need a recent (> 2.6.30-rc1) kernel in order to apply the included patch. Once
you've applied the patch, drop lsmsb.c into security/lsmsb and rebuild.

You can assemble a sandbox file with:
  ./lsmsb-as sandbox-input.sb > sandbox
And then run a shell in the sandbox with:
  ./lsmsb-install sandbox

To read the code, see http://www.imperialviolet.org/binary/lsmsb.html

Chrome for Linux

Myself and the rest of the Chrome Linux team have been working hard over the past few months to get Chrome ported to Linux. It's certainly very rough still, but it runs and the first development release just got released.

I'm very happy with this and we should be pushing new releases frequently from now on. If you're in the San Francisco office tomorrow, feel free to pop by the office as I'll be bringing in champagne.

Just to be clear, here are some of the things which don't work:

W2SP and Seccomp

I gave a talk today at W2SP about opportunistic encryption. You would have to ask someone in the audience how it went to get a real answer, but I feel it went OK.

The talk was based on a paper that I wrote for the conference.

Also, LWN covered some recent work that I've been doing at Google with Linux sandboxing.

Moved to GitHub

I've finally manged to move IV off heeps, a server which it's been ticking along on for the last half decade.

In the process, I've moved to GitHub using their Pages system. We'll see how well it works out!

In the process I've cleaned out a lot of stuff and probably broken lots of links. I trust that the search engines will figure it all out soon enough.

I'll be at CodeCon this y...

I'll be at CodeCon this year.

Thanks to Alexander Sotir...

Thanks to Alexander Sotirov to pushing me to check that the carry chains in donna-c64 were sufficient. I don't know if I realised something when I wrote it which I'm currently missing, or if I just screwed up, but I now believe that they're wrong.

I wrote this Haskell code to check it:

This Haskell code has been written to experiment with the carry chains in
curve25519-donna-c64. It's a literate Haskell program, one can load it into
GHCI and play along.

> module Main where
>
> import Data.Bits (shiftR, (.&.))

There are two constants that we'll need.

Our five limbs are, nominally, 51 bits wide, so this is the maximum value of
their initial values.

> twoFiftyOneMinusOne = (2 ^ 51- 1

2^128 - 1 is the limit of the range of our temporary variables. If we exceed
this at any point, our calculations will be incorrect.

> two128MinusOne = (2 ^ 128- 1

Now we define a type which mimics our 128-bit unsigned type in C. It's a
disjuction of an Integer and the distinguished value 'Overflow'. 'Overflow' is
contagious: if we try to perform any operations where one or both of the
operands is 'Overflow', then the result is also 'Overflow'.

> data U128 = U128 Integer
>           | Overflow
>           deriving (Show, Eq)

We make U128 an instance of Num so that we can perform arithmetic with it.

> instance Num U128 where
>   (U128 a) + (U128 b) = mayOverflow (a + b)
>   _ + _ = Overflow
>   (U128 a) * (U128 b) = mayOverflow (a * b)
>   _ * _ = Overflow
>   (U128 a) - (U128 b) = mayOverflow (a - b)
>   _ - _ = Overflow
>   negate _ = Overflow
>   abs a@(U128 _) = a
>   abs _ = Overflow
>   signum (U128 _) = 1
>   signum _ = 0
>   fromInteger = mayOverflow

> instance Ord U128 where
>   compare (U128 a) (U128 b) = compare a b
>   compare _ _ = EQ

This function lifts an Integer to a U128. If the value is out of range, the
result is 'Overflow'

> mayOverflow :: Integer -> U128
> mayOverflow x
>   | x > two128MinusOne = Overflow
>   | x < 0 = Overflow
>   | otherwise = U128 x

Our field elements consist of five limbs. In the C code, these limbs are
actually uint64_t's, but we keep them as U128's here. We will convince ourselves
that we don't hit any 64-bit overflows later.

> data FieldElement = FieldElement { m0 :: U128, m1 :: U128, m2 :: U128,
>                                    m3 :: U128, m4 :: U128 }
>                                  deriving (Show, Eq)

Now, two helper functions:

This function takes only the bottom 51-bits of a value

> clamp :: U128 -> U128
> clamp (U128 a) = U128 $ a .&. 0x7ffffffffffff
> clamp _ = Overflow

This function drop the bottom 51-bits of a value

> topBits :: U128 -> U128
> topBits (U128 a) = U128 $ a `shiftR` 51
> topBits _ = Overflow

This function simulates the 'fsquare' function in donna-c64, including its carry
chain. If the carry chain is sufficient, then iterating this function for any
valid initial value should never overflow.

> square :: FieldElement -> FieldElement
> square e = result where
>   t0 = m0 e * m0 e
>   t1 = m0 e * m1 e +
>        m1 e * m0 e
>   t2 = m0 e * m2 e +
>        m2 e * m0 e +
>        m1 e * m1 e
>   t3 = m0 e * m3 e +
>        m3 e * m0 e +
>        m1 e * m2 e +
>        m2 e * m1 e
>   t4 = m0 e * m4 e +
>        m4 e * m0 e +
>        m3 e * m1 e +
>        m1 e * m3 e +
>        m2 e * m2 e
>   t5 = m4 e * m1 e +
>        m1 e * m4 e +
>        m2 e * m3 e +
>        m3 e * m2 e
>   t6 = m4 e * m2 e +
>        m2 e * m4 e +
>        m3 e * m3 e
>   t7 = m3 e * m4 e +
>        m4 e * m3 e
>   t8 = m4 e * m4 e
>
>   t0' = t0 + t5 * 19
>   t1' = t1 + t6 * 19
>   t2' = t2 + t7 * 19
>   t3' = t3 + t8 * 19
>
>   t1'' = t1' + topBits t0'
>   t2'' = t2' + topBits t1''
>   t3'' = t3' + topBits t2''
>   t4' = t4 + topBits t3''
>   t0'' = t0' + 19 * topBits t4'
>   t1''' = clamp t1'' + topBits t0''

At this point, we implement two carry chains. If 'currentChain' is true, then we
implement the carry chain as currently written in donna-c64. Otherwise, we
perform an extra step and carry t1 into t2.

>   result = if currentChain
>               then FieldElement (clamp t0'') t1''' (clamp t2'') (clamp t3'')
>                                 (clamp t4')
>               else FieldElement (clamp t0'') (clamp t1''') t2''' (clamp t3'')
>                                 (clamp t4') where
>                    t2''' = clamp t2'' + topBits t1'''

This is the maximum initial element: an element where all limbs are 2^51 - 1.
Inspection of the 'fexpand' function should be sufficient to convince oneself of
this.

> maxInitialElement :: FieldElement
> maxInitialElement = FieldElement twoFiftyOneMinusOne twoFiftyOneMinusOne
>                                  twoFiftyOneMinusOne twoFiftyOneMinusOne
>                                  twoFiftyOneMinusOne

This function takes two field elements and returns the worst case result: one
where the maximum of each limb is chosen.

> elementWiseMax :: FieldElement -> FieldElement -> FieldElement
> elementWiseMax x y = FieldElement (f m0) (f m1) (f m2) (f m3) (f m4) where
>   f :: (FieldElement -> U128) -> U128
>   f accessor = max (accessor x) (accessor y)

We now define a series of values generated by squaring the previous element and
setting any limb that is less than the maximum to the maximum value.

> maxSeries = iterate (elementWiseMax maxInitialElement . square)
>                     maxInitialElement

This value controls which carry chain is used in 'square', the current one or
the one with the extra carry

> currentChain = True

By running this, we can see that the current carry chain is insufficient for
this simulation:

ghci> maxSeries !! 4
FieldElement {m0 = Overflow, m1 = Overflow, m2 = Overflow, m3 = Overflow,
              m4 = Overflow}

The series overflows after only four iterations. However, if we use the
alternative carry chain, the series is stable far beyound the requirements of
the Montgomery ladder used in donna-c64:

ghci> maxSeries !! 100000
FieldElement {m0 = U128 2251799813685247, m1 = U128 2251799813685247,
              m2 = U128 2251799813685247, m3 = U128 2251799813685247,
              m4 = U128 2251799813685247}

Additionally, these values are small enough not to overflow the 64-limb limbs.

When I wrote curve25519-d...

When I wrote curve25519-donna I implemented many of the critical functions in x86-64 assembly. It was a lot of code, even using the C preprocessor! This got a good 20% boost in speed. This was clearly very important because it made donna-x86-64 faster than djb's version .

However, djb just pointed out that the 64-bit C implementation of donna was now as fast as my hand coded version. Turns out that GCC 4.3 greatly improved the quality of the code generation for this sort of code and now equals my hand crafted efforts! Well done to the GCC team because the C code is vastly smaller and easier to understand. Thus, the x86-64 of donna has been removed from the repo.

Packet sizes in DNSSEC

Even when the DNS root hasn't started signing records, one can still use trust-anchors to employ DNSSEC for those TLDs which support it. Follow the links from Ben Laurie's latest blog post on the matter.

The .se ccTLD is one of those TLDs which support DNSSEC. You can test it with: dig +dnssec -t any se @a.ns.se. You'll see lots of NSEC, RRSIG and DNSKEY records. (DNSSEC is very complicated.)

However, the size of that reply is 3974 bytes long! All that from a request packet of 31 bytes. That's a very easy to use 100x DoS amplication. Of course, if you use mirror amplication like that, you cannot forge the source addresses of the flooding packets, making the flood easier to filter. However, DNSSEC may well bring DoS floods into the reach of many more attackers.

When Layers of Abstractio...

When Layers of Abstraction Don't Get Along: The Difficulty of Fixing Cache Side-Channel Vulnerabilities.

Why networked software should expire.

If your business is still writing letters in WordStar and printing them out on an original Apple Laserwriter, good for you. If you're writing your own LISP in vi on a PDP 10, best of luck. However, if you're still using IE6, that's just rude.

Networked software (by which I just mean programs that talk to other programs) on public networks have a different cost model than the first two examples, but our mental models haven't caught up with that fact yet. We're stuck with the idea that what software you run is your own business and that's preventing needed changes. Here's one example:

ECN (Explicit Congestion Notification) is a modification to TCP and IP which allows routers to indicate congestion by altering packets as they pass though. Early routers dropped packets only when their buffers overflowed and this was taken as an indication of congestion. It was soon noticed that a more probabilistic method of indicating congestion performed better. So routers starting using RED (random early drop) where, approximately, if a buffer is 50% full, a packet has a 50% chance of getting dropped. This gives an indication of congestion sooner and prevented cases where TCP timeouts for many different hosts would start to synchronise and resonate.

To indication congestion, RED drops a packet that has already traversed part of the network; throwing away information. So ECN was developed to indicate congestion without dropping the packets. Network simulations and small scale testing showed a small, but significant benefit from it.

But when ECN was enabled for vger.kernel.org, the mailing list server which handles Linux kernel mailing lists, many people suddenly noticed that their mails weren't getting though. It turned out that many buggy routers and firewalls simply dropped all packets which were using ECN. This was clearly against the specifications and, in terms of code, an easy fix.

ECN wasn't enabled by default in order to give time for the routers to get fixed. In a year or so, it was hoped, it could start to be used and the Internet could start to benefit.

That was over eight years ago now. ECN is still not enabled by default in any major OS. The latest numbers I've seen (which I collected) suggest that 0.5% of destinations will still stop working if you enable ECN and the number of hosts supporting ECN appears to be dropping.

The world has payed a price for not having ECN for the past eight years. Not a lot, but downloads have been a little slower and maybe more infrastructure has been build than was really needed. But who actually paid? Every user of the Internet did, a little bit. But that cost was imposed by router manufactures who didn't test their products and network operators who didn't install updates. Those people saved money by doing less and everyone else paid the price.

These problems are multiplying with the increasing amount of network middleware (routers, firewalls etc) getting deployed; often in homes and owned by people who don't know of care about them.

Recently, Linux 2.6.27 was released and broke Internet access for, probably, thousands of people. Ubuntu Intrepid released with it and had to disable TCP timestamps as a work around while the issue was fixed.

But the issue wasn't a bug in 2.6.27. It was a bug in many home routers (WiFi access points and the like) which was triggered by a perfectly innocent change in Linux that caused the order of TCP options to change. (I felt specifically aggrieved about this because I made that change.) The order was soon changed back and everything started working again.

But, for the future, this now means that the order cannot change. It's not written down anywhere, it's a rule written in bugs. This imposes costs on anyone who might write new TCP stacks in the future, by requiring increased testing and reduced sales as some customers find that it wont work with their routers. These are all costs created by router manufactures and paid by others.

Economists calls these sorts of costs externalities and they are seen as a failure which needs to be addressed. Often, in other areas, they are addressed by regulation or privitisation. Neither of those options appeal in this case.

An uncontroversial suggestion that I'm going to make is that we require better test suites. As a router manufacturer, testing involves checking that your equipment works with a couple of flavors of Windows and, if we're lucky, Linux and some BSDs too. This is much too small of a testing surface. There needs to be an open source test suite designed to test every corner of the RFCs. The NFS connectathons are similar in spirit and probably saved millions of man-hours of debugging over their lifetimes. Likewise, the ACID tests for web browsers focused attention on areas where they were poorly implementing the standards.

And, although my two examples above are both IP/TCP related, I don't want to suggest that the problem stops there. Every common RFC should have such a test suite. HTTP may be a simple protocol but I'll bet that most implementations can't cope with continued header lines. It's those corners which a test suite should address.

Testing should help, but I don't think that it'll be enough. Problems will slip through. Testing against specifications will also never catch problems with the specification itself.

DNS requests can carry multiple questions. There's a big counter in the packet to say how many questions you are asking. However, the reply format can only hold one response code. Thus, I don't know of any DNS server which handles multiple questions (most consider the request to be invalid).

The ability to ask multiple questions would be very helpful. Just look at the number of places which suggest that you turn off IPv6 to make your networking faster. That's because software will otherwise ask a single IPv6 question of DNS, wait for the reply and then ask the IPv4 question. This delay, caused by not being able to request both results in a single request, is causing people to report slowdowns and disable IPv6.

We need to fix DNS, but we never can because one cannot afford break the world. We can't even start a backwards compatible transition because of broken implementations.

That's why networked software should have an expiry date. After the expiry date, the code should make it very clear that it's time to upgrade. For a router, print a big banner when an administrator connects. Flash all the error lights. For software, pop up a dialog every time you start. For home routers, beep and flash a big indicator.

We don't need everyone to update and, as manufacturers fold, maybe there won't be any firmware updates or software upgrades. Almost certainly the device shouldn't stop working. But we need to make more of an effort to recognise that large populations of old code hold everyone else back.

If we can know that nearly all the old code is going to be gone by some date, maybe we can make progress.

(Thanks to Evan for first putting this idea in my mind.)

rwb0fuz1024 included in eBATS

rwb0fuz1024 (pronounced 'robo-fuzz') has been included in the eBATS benchmarking suite. Not all of the test systems have been run with it yet, but here's one which has. It's the fastest verification by fa r .

(Full results here)

Sandboxing on Linux

This blog post has been brought about because of the issues of sandboxing Chromium on Linux (no, it's not ready and wont be for months).

Chromium uses a multiprocess model where each tab (roughly) is a separate process which performs IPCs to a UI process. This means that we can do parallel rendering and withstand crashes in the renderer. It also means that we should be able to sandbox the renderers.

Since the renderer are parsing HTML, CSS, Javascript, running plugins etc, sandboxing them would be very desirable. There's a lot of scope for buffer overflows and other issues in a code base that large and a good sandbox would dramatically reduce the scope of any exploits.

Traditional sandboxes: chroot, resource limits

People have been using chroot jails for many years. A chroot call changes the root of the filesystem for the current process. Once that has happened the process cannot interact with any of the filesystem outside the jail. As long as the process cannot gain root access, it's a good security measure.

Resource limits prevent denial of service attacks by, say, trying to use up all the memory on the system. See the getrlimit manpage for details.

These two mechanisms are supported by most UNIX like systems. However, there are some limitations:

Network access, for one, is not mediated by the filesystem on these platforms, so a compromised process could spew spam or launch attacks on an internal network. Also, the chroot call requires root access. Traditionally this has been done with a small SUID helper binary, but then root access is needed to install etc.

ptrace jails

The ptrace call is used by the strace utility which shows a trace of all the system calls that a child makes. It can also be used to mediate those system calls.

It works like this: the untrusted child is traced by a trusted parent and the kernel arranges that all system calls that the child makes cause a SIGTRAP, stopping the child. The parent can then read the registers and memory of the child and decide if the system call is allowed, permitting it or simulating an error if not.

The first issue is that some system calls take pointers to userspace memory which needs to be validated. Take open, which passes a pointer to the filename to be opened. If the parent wishes to validate the filename it has to read the child's memory and check that it's within limits. That's perfectly doable with ptrace.

The issue comes when there are multiple threads in the untrusted address space. In between the parent validating the filename and the kernel reading it, another thread can change its contents. In the case of open that means that the validator in the parent see one (safe) filename but the kernel actually acts on another. Because of this, either multithreaded children need to be prohibited, or the validator must forbid all system calls which take a pointer to a buffer which needs to be validated.

When calls like open have been prohibited, there's another trick which can be used to securely replace it:

UNIX domain sockets are able to transmit file descriptors between processes. Not just the integer value, but a reference to the actual descriptor (which will almost certainly have a different integer value in the other process). For details see the unix and cmsg manpages.

With this ability an untrusted child can securely open a file by making a request, over a UNIX domain socket to a trusted broker. The broker can validate the filename requested in safety: because it's in another address space the filename is safe between validation and use by the kernel. The broker can then return the file descriptor over the socket to the untrusted child.

The major problem with ptrace jails is that they have a high cost at every system call. On my 2.33GHz Core2 a simple getpid call takes 128ns. When a process is ptraced, that rises to 13,800ns (a factor of 100x slower). Additionally, Chromium on Linux is a 32-bit process because of our JIT, so getting the current time is a system call too.

Seccomp

Seccomp has a rather messy past (see the linked Wikipedia page for details). It's a Linux specific mode which a process can request whereby only read, write, exit and sigreturn system calls are allowed. Making any system call not on the permitted list results in immediate termination of the process.

This is a very tight jail, designed for pure computation and is perfect for that. It's enabled by default in kernel builds (although some distributions disable it I believe). It used to be enabled via a file in /proc but, in order to save space, it's now a prctl.

This issue is that the jail is too tight. It's great that read and write calls are enabled without overhead because that's much of what one of our rendering processes will use, but many other system calls would be nice (brk and mmap for memory allocation, gettimeofday etc). We would have to use the broker model for all of them.

For some calls the broker model has to be updated. Allocating memory to an address space isn't something which can be performed outside that address space so, in this case, the broker for these calls has to be in the same address space. This means that there's an untrusted thread running under seccomp and a trusted thread, not running seccomped, in the same process. The untrusted thread can request more memory by making an request over a pipe to the trusted thread. The trusted thread can then perform the allocation in the same address space.

This presents some issues when writing the trusted code. Because untrusted code has access to the memory the only thing the trusted thread can trust are its registers. That means no stack nor heap usage. Basically the trusted code has to be written in assembly and has to be pretty simple. That's not a huge problem for us however.

But we will be making lots of these other system calls, not just the memory allocation ones, but time calls, poll etc. All have to use a broker model.

To recap, a basic system call (getpid) on my 2.33GHz Core2 takes about 128ns. Performing the same operation over a pipe to another thread takes 7,775ns and to another process takes 8,423ns, roughly a factor of 60x slower.

Again, this is a very painful slowdown given the volume of such calls that we expect to make.

SELinux

Fedora, rightfully, makes a lot of noise about the fact that they have SELinux. It's a huge beast and Fedora's work has mostly been a process of taming the complexity and dealing with the fact that very little is written with SELinux in mind.

I don't have Fedora installed anywhere, but this may be a very nice solution to our issues. However, I suspect that root access will be required, again, to configure it. I speak mostly from a position of ignorance here, however. I should install Fedora at some point and have a play.

The Other Man's Grass

Recent releases of OSX have a system call actually called sandbox_init. It's a little half-baked at the moment, but shows great promise.

It's a feature from TrustedBSD and, in the limit, allows for a Scheme like language to give a detailed specification of the shape of the sandbox which is compiled to bytecode and loaded into the kernel. You can see some examples of the profile language in the slides for this USENIX talk. But, for the moment, I believe that just a few preset profiles are provided (see the manpage).

Rolling one's own

SELinux is implemented atop of LSM which is a general framework for hooking security decisions in the Linux kernel. It's conceivable that one could write a sandboxing module using these hooks.

It would require root access to install, but then so do many of the other solutions. It would probably play badly with other LSM users too, but Fedora is the only major distribution to be using them as far as I know. However, it would also be a large distraction.

Summary of data
PlatformSimple system call... via a broker thread... via a broker process... when ptraced
32-bit 136.9ns 8161.4ns 8327.3ns 14087.0ns
64-bit 128.7ns 7775.0ns 8423.3ns 13779.9ns

Obfuscated TCP

It's now in its 3rd iteration, Obfuscated TCP now has an updated site, mostly working code etc. I need people to go to the site, look at the docs, watch the video, build the code, try stuff out etc. Tell me what works and what doesn't. Email address is at the top of the page. Thanks to all who do, and remember that you don't just have to email if you have problems, positive reports are good too!

Google datacenters

There are different levels of secrets at Google. Almost everything unreleased is “confidential” - which means that we don't talk about it to the outside world. Then there is the “top secret” stuff - stuff that you don't even talk about to other Googlers.

Now, top secret stuff is rare because it's a little poisonous. An environment where lots of things are secret between coworkers isn't a pleasant one. How we cool our data centers was one of those items and I was sworn to secrecy when I was lucky enough to be given a guided tour of our Oregon operations.

But, for whatever reasons, this information is now public! Seriously, this is some of the coolest (no pun intended) stuff that Google does: go read about evaporative cooling.

A Rabin-Williams signature scheme: rwb0fuz1024

I wrote a Rabin-Williams signature scheme [source]:

Crit-bit trees

I wrote up djb's implementation of crit-bit trees for strings here [pdf]. Crit-bit trees have several nice properties:

Several groups of Linux k...

Several groups of Linux kernel papers have been published recently. Here's my pick of them:

First we have the Proceedings of the 2008 Linux Symposium (these are in some order of order, favourite first):

Next there's the ACM SIGOPS Operating Systems Review. These papers are about much more experimental developments in the kernel and are thus more fun, even if they are less likely to see the light of day:

I've just releasedtwo new...

I've just released two new curve25519 implementations: one in C and one in x86-64 assembly. The latter is 10% faster than djb's implementation.

curve25519 is an elliptic curve, developed by Dan Bernstein, for fast Diffie-Hellman key agreement. DJB's original implementation was written in a language of his own devising called qhasm. The original qhasm source isn't available, only the x86 32-bit assembly output.

Since many x86 systems are now 64-bit, and portability is important, this project provides alternative implementations for other platforms.

Implementation Platform Author 32-bit speed 64-bit speed
curve25519 x86 32-bit djb 265ยตs N/A
curve25519-donna-x86-64 x86 64-bit agl N/A 240ยตs
curve25591-donna Portable C agl 2179ยตs 628ยตs

(All tests run on a 2.33GHz Intel Core2)

Google has, at last, open...

Google has, at last, open sourced Protocol buffers. My, very minor contribution to this is that I wrote the basis for the encoding documentation.

Protocol buffers pretty much hit the sweet spot of complexity and capability. (See XML and ASN.1 for examples of attempts which missed.) I have the beginnings of a protocol buffer compiler for Haskell that I wrote for internal apps. Now that the C/Java/Python versions are out, I should probably clean that up and put it on Hackage. But every coder should consider protocol buffers for their serialisation needs from now on.

The Black Swan

Firstly, if you're wondering what happened to all the ObsTCP stuff, it didn't disappear, it just moved to a different blog. Things are still moving as fast as I can push them.

(ISBN: 1400063515)

This book has some good, if unoriginal, points about the stupidity of much of the modeling done in today's world, esp the world of finance. Sadly, these are hidden in many pages of self-centered rambling and discourse on adventitious topics. If you're thinking of buying this book, get The (Mis)behaviour of Markets by Mandelbrot instead; you'll thank me.

I've added a bunch of Obs...

I've added a bunch of Obsfucated TCP stuff to the obstcp project page code.google.com include kernel patches, userland tools, specs and friendly introductions.

Also, I posted it to Reddit. If it doesn't get downvoted into /dev/null in 60 seconds, the comments will probably end up there.

OpenID - not actually spawn of Satan

A blog post aggregating complaints about OpenID has been popping up in different places this morning. If you've read it, you might want a little perspective. I'm not going to deal with each point in turn because there's so many, mostly repeating each other.

Phishing

At login time, the site that you're logging into can end up redirecting you to your OpenID provider. Your provider then tells you to go to their site and enter your login information, then click a button to try again. They don't provide a "link" to their site and they don't ask for your password.

Some early providers might not have followed these basic steps, but all the reasonable ones do.

Yes, it's still possible for users to be confused but, by habit they'll be used to doing to right thing.

XSS and CSRF

XSS problems on the providers site are a big deal. This criticism is reasonable.

CSRF may be a bigger deal because you are more likely to be 'logged in' to the target. However, most users already keep persistent cookies to save logging into these sites. The additional attack surface here is dubious; CSRF issues are a problem with or without OpenID.

DNS poisoning

If your OpenID starts with https://, you should be protected from DNS poisoning attacks and the like by the usual TLS PKI. This isn't perfect, but it's pretty good.

However, the OpenID spec says that plain domain names are normalised by prepending http://. This is a technical problem with the spec and should be fixed. Until then, this is a reasonable criticism but not a fundamental issue.

Privacy

The OpenID provider has a lot of information about your activities. This is little different than, say, your email account and many people are happy with Gmail. Likewise, password recovery on most of the sites which could use OpenID is based on email access, so most people already have a single password that suffices for entry to many sites.

If you don't like the idea of Gmail you can run your own email server. Likewise, you can run your own OpenID provider.

Using the same OpenID on many sites does allow them to link your activities. So does giving these sites your email address for password recovery. So does using the same IP (although to a lesser extent).

Some providers will let you have many OpenIDs linked to the same account for this reason. Joe user probably won't use that feature and probably gives the same email address to all those sites already and so looses nothing.

Trust problems

OpenID is not a trust system. Trust systems may be built on top of identity systems. Likewise, apples are not oranges and complaints about their lack of tangyness are moot.

Usability / Adoption

Somewhat valid points here. It's a big job to get widespread adoption and, at the moment, it's a pretty small crowd that uses OpenID. However, OpenID doesn't need a flag day; it can have incremental deployment.

Availability

Valid points. If your provider goes down you're going to have a bad day.

Conclusion

I don't believe that OpenID should be used to login to your bank account. However, for the myriad of sites that I login to (Google Reader, reddit, ...) it would be nice to just be able to type my OpenID in. It's decently suited to that because I'm fed up with all these accounts.

I'm now running a Ubuntu ...

I'm now running a Ubuntu based laptop with a somewhat functions Obsfucated TCP patch in its kernel. (If you have a Neo like view of the Internets you'll be able to see it by the funny options in the SYN packets.)

Hopefully soon I'll be able to post a first draft patch for other people to try. In the mean time, I wrote the start of the mounds of documentation I suspect it'll need: a very non-technical introduction.

I've updated the patches ...

I've updated the patches linked to in the last post with today's work. Both sides now end up with the same shared key (and not just because they got the same private key from lack of entropy like before). That took some fun tracking down of bugs.

Also, packets are now HMAC-MD5'ed with the shared key, and invalid packets are dropped. That also took far longer than expected. I ended up using the MD5 implementation from the CIFS filesystem because the kernel's crypto library is just plain terrible. It's also totally undocumented but, from what I can see, you can't lookup an algorithm without taking a semaphore, and that requires that you be able to sleep. I almost think I must be missing something because that's dumber than the bastard offspring of Randy Hickey and Jade Goodie.

But there we go. Encryption (with Salsa20) to come next Wednesday.

First Obsfucated TCP patches

After a day of kernel hacking, I have a few patches which, together, make a start towards implementing ObsTCP.

At the moment, it will advertise ObsTCP on all connections and, if you have two kernels which support it, you'll get a shared key setup. At the moment, the private key is generated at boot time and since the host doesn't have any entropy then, it's always the same. So I'll have to do something special there. Also, I've a problem where the ACK with the connecting host's public key can get lost. Since ACKs aren't ACKed, this can be a real pain. I think I need to include it in every transmitted packet until (yet another) option signifies that it's been received.

After the last post expla...

After the last post explained why small curves aren't good enough for obsfucated TCP, I decided that, since I'm going to have to do some damage to the TCP header to get a bigger public key in there anyway, I might as well go the whole way and use curve25519, by djb. Now, djb has forgotton more about elliptic curves than I'll ever know and I feel much happier using a curve that's been designed by him. As you can probably guess from the name, it's a curve over 2255-19 - a prime. So the public keys are 32 bytes long.

In order to get that much public key material into a TCP header, here's my proposed hack: Jumbo TCP options.

djb's sample implementation of curve25519 is written in a special assembly language called qhasm. Sadly, it's so alpha that he's not actually released it. So the sample implementation is for ia32 only, uses the floating point registers and has 5100 lines of uncommented assembly. It is, however, freaking quick.

However, since I have kernel-space in mind for this I've written a C implementation. It's about 1/3 the speed (and I've not really tried to optimise it yet), doesn't use any floating point (since kernel-space doesn't have easy access to the fp registers in Linux) and fuzz testing seems to indicate that it's correct. (At least, it's giving the same answers as djb's code.)

Next step: hacking up the kernel. (And I thought the elliptic curve maths was hard enough.)

Elliptic curves don't work either

(For context, see my previous post on OTCP)

In any Diffie-Hellman exchange based on elliptic curves, we have Q=aP where P and Q are points on an elliptic curve. The operation of multiplying a point and a scalar is well defined, but unimportant here. The problem facing the attacker is, given Q and P, find a. If they can do that, we're sunk.

If you could find a pair of numbers such that: cP + dQ = eP + fQ then you're done because: (c-e)P = (f-d)Q = (f-d)aP, then a = (c-e)/(f-d) mod n, where n is the size of the field underlying the curve.

Finding such a point by picking random examples is never going to work because of the storage requirements. However, if you define a step function which takes a pair (c, d) and produces a new pair (c', d') you have defined a cycle through the search space. (It must be a cycle because the search space is finite. At some point you must hit a previous state and loop forever.) Now you can use Floyd's cycle finding algorithm to find a collision with constant space. This is an √n algorithm for breaking this problem and is well known as Pollard's rho method.

Now, if you have many of these problems you get a big speed up by using some storage. Assume that you do the legwork to solve an instance of the problem and that you record some fraction of the points that you evaluated. (How you choose the points isn't important so long as it's a function of the point; say pick all points where the first m bits are zero.)

Now, future attempts to break the problem can collide with one of the previous points. If you find cP + dQ = eP + fR (note that P is a constant of the elliptic curve system) and also that R = bP (because we solved this instance previously) then cP + dQ = cP + adP = (e+fb)P and so (c-(e+fb)) / d = a (and we know all the values on the left-hand side).

Now, 2112 (14 bytes) is about as big an elliptic curve point as we can fit in a TCP header. The maximum options payload is 40 bytes, of which 20 are already taken up in modern TCP stacks. We need 2 bytes of fluff per option and, unless we want this to be the last TCP header ever, we need to leave at least 4 bytes. That's where the 14 byte limit comes from.

We give the attacker 250 bytes of space. I believe that each point will take 3*14 bytes of space for the (c,d,Y) triple, where Y = cP+dQ. Thus they can store 244 distinguished points. Thus one in 256-44=12 points are distinguished. Additionally, generating those 244 points isn't that hard, computationally. This suggests that an attacker can find a collision in only 212 iterations., or about 213 field multiplications.

So, again, a reasonable attacker can break our crypto in real time.

This scheme becomes much harder to sell if we have to do evil things to the TCP header in order to make it work.

If you've been wondering ...

If you've been wondering what I'm up to at work, we now have a public blog for the RechargeIt project.

How sad: from reading the...

How sad: from reading the sleepcat documentation on network partitions, it's clear that BDB uses a broken replication system (i.e. not Paxos). That's a shame because I was hoping to use it.

Yahoo now has OpenID for ...

Yahoo now has OpenID for all its accounts, which is great. Wonderful in fact. OpenID is a good thing for many authentication needs on the Internet and will make the world a better place.

However,...

EROS doesn't exactly build cleanly and I don't know if my fixes break anything critial yet, but they do get the code to compile at least

(the EROS install page is here)

Patch One

--- Makefile	Wed Jul  4 19:15:29 2001
+++ Makefile	Sun Jun 30 18:36:31 2002
@@ -103,7 +103,8 @@
 XENV_LIBXML2 =  libxml2-2.3.13
 XENV_LIBXSLT =  libxslt-0.13.0
 
-XENV_TOOLS = xenv-make xenv-binutils xenv-gcc xenv-libxml2 xenv-libxslt
+#XENV_TOOLS = xenv-make xenv-binutils xenv-gcc xenv-libxml2 xenv-libxslt
+XENV_TOOLS = xenv-binutils xenv-gcc xenv-libxml2 xenv-libxslt
 
 xenv: $(XENV_TOOLS)
 
@@ -120,12 +121,6 @@
 	-rm -rf build
 
 xenv-binutils:
-	-rm -rf build
-	-mkdir build
-	cat pieces/$(XENV_BINUTILS)/tgz-part.* | $(ZCAT) - | (cd build; tar xf -)
-	(cd build/$(XENV_BINUTILS); ./configure \
-			--prefix=$(EROS_XENV) \
-			--target=$(EROS_TARGET)-unknown-linux)
 	$(MAKE) -C build/$(XENV_BINUTILS) all install
 	@echo
 	@echo "BUILD SUCCEEDED... removing build subdir"

Patch Two

--- bfd.h	Sun Jun 30 19:16:33 2002
+++ bfd.h	Sun Jun 30 19:16:21 2002
@@ -98,7 +98,7 @@
 #define TRUE_FALSE_ALREADY_DEFINED
 #endif /* MPW */
 #ifndef TRUE_FALSE_ALREADY_DEFINED
-typedef enum bfd_boolean {false, true} boolean;
+typedef enum bfd_boolean {FALSE, TRUE} boolean;
 #define BFD_TRUE_FALSE
 #else
 /* Use enum names that will appear nowhere else.  */

Final exam finished this ...

Final exam finished this afternoon. Hung up my blazer for the last time. This feels weird.

On the upside looks like I'm a tech reviewer for the new edition of UNIX Power Tools.

Gun crime up 49% and this...

Gun crime up 49% and this is since we banned handguns. I'm sure ESR would have something pro-gun to say about this, I'm not going to.

Those CA results are rubbish past 8 bits. It was a 1 char typo in the code which I'm happy about because they make sense now. Maybe some more analysis soon

Been working on a secure version of GRUB which keeps checksums on a write protected floppy and won't boot a trojan kernel. It will also check any number of other files before booting (say /sbin/init and friends). Working pretty well

Ian Hill gets a submission on /.

Metis (this box) got a nice update over the last couple of days. Switched from apache to publicfile on the internal interface, bind to tinydns for the DNS server and installed gr

A NewScientist story linked to imperialviolet.org . Thanks Will. (Will Knight is one of only 2 clueful journalists I've ever talked to. The other being Andrew from theregister).

Rule 30

I'm troubled

Thinking about the CA results from yesterday I'm pretty convinced that I'm doing something silly. I verified the results for 3 bits by hand and it all came out right. But still, it makes a mess of my thinking and I want those results to be wrong.

I'll go over the code again at some point

OpenSSH

Wes doesn't like the way Theo is handling the latest OpenSSH bug. What would you do different Wes?

If Theo says "here's the fix" it's then a rush between sysadmins and blackhats as to which get to a host first. sshd cannot be chrooted or run as non-root so all cracks are total and you're looking at a reinstall. Privsep isn't really ready for the prime-time but it does make people mostly immune without revealing the bug (thou with the added focus people won't be long in finding it independantly). It's a bad situation, but Theo is handling it well

SSH Remote Exploit. All u...

SSH Remote Exploit. All upgrade to OpenSSH 3.3 (see this for Debian)

Maximum of 44 outputs for CA's upto 27 bits now confirmed. (which means that 25 and 27 have 44 outputs since the number cannot go down). I don't expect to finish processing 29. I don't think the answer would be a shock anyway.

IBM laptops get TCPA chips. Now the chip actually sounds a little useful:

The chips, known as Trusted Platform Modules, generally include a 16-bit microprocessor, a random number generator, hashing capabilities and a significant amount of non-volatile memory. Among the security features TPMs provide are an ability to generate and securely store digital certificates and private keys on-chip, hardware support for multiple authentication schemes and the encryption/decryption of files on demand.

If they weren't fuckware chips it could be a good thing. As it is...

Forgot to upload Gilmore's essay. Fixed.

More on Rule 30

Been playing about with Rule 30 CAs (see ANKOS page 53). You might want to read Sunday's post first (if you have not already).

As I said, Rule 30 CA seems to map very regular numbers onto odd sequence numbers. The lack of type A randomness in the input number and the presence of it in the output data, gives rise to seeming `added complexity'.

Today I'm taking a look at that mapping.

The input to the CA is the starting line expressed as a number. A black block is a 1, white is 0. The input size is always odd. The output is the first n bits down the centerline (not including the bit from the starting line). For example

A Rule 30 CA

That is a size 3 CA with starting value 6 (101 in binary). The centerline is down the middle (the white column) and the result is 1 (001 in binary).

Now we can cycle through all the possible input values for a given size and record the output. Rather than give a huge table I've taken a tally count of the outputs and plotted output value against count

Results from 5 bits Results from a rule 30 run with 5 bits of input

Results from 7 bits Results from a rule 30 run with 7 bits of input

Results from 9 bits Results from a rule 30 run with 9 bits of input

Results from 15 bits Results from a rule 30 run with 15 bits of input

We can certainly see that the mapping is not one way. In fact there are very few possible output values and all the inputs map to a small number of possible outputs. The maximum count in each case is 2^{n-3} + 2^{n-2} (where n is the size of the input in bits).

The number of different outputs is also noteworthy. It has a maximum of 44 upto 23 bits of input. (of course it gets quite time consuming to run tests with greater than 23 bits of input).

Size of input (x) vs number of different outputs Graph of the number of output values for a variable input size

I don't have any conclusions at the moment, but at least I would be worried about using Rule 30 CAs as random number generators given that all the seeds seem to map to (at most) 44 sequences

Cool! I'm on Ryan's LJ fr...

Cool! I'm on Ryan's LJ friends list. Now if only he posted weird and wonderful stuff in friends only posts

A few more scans from the prom. (photos from weezel)

toad.com is dead so I've posted a local copy of John Gilmore's great essay "What's wrong with copy protection" since some people might need the anti-memes for this

Given the price of flights to LV I'm not going to make DefCon 10 this year. I might well go down to the UKUUG exhibition though. Anyone want to meet up? Mail me.

Wolfram and Randomness

Some thoughts on randomness, not structured yet...

DefineCryptographic Randomness(type A)The lack of patterns in a sequence
Chaitin Randomness(type B)The inability to express a sequence in a short program

You may wish to read up on the latter

Now, sequences that are type A are used all the time, most explictly in cryptography. If I run an RC4 cipher with a random key I can generate megabytes of data will pass type A tests. However, you can certainly code RC4 in less than a megabyte of data on a resonable UTM. And even if you couldn't do it in less than a meg you can just generate more RC4 data. Thus my type A random sequence is not very type B.

Now sequences that aren't type A random almost certainly aren't type B. I don't have anything like a proof for this yet, but it seems sensible. However, I don't think you can ever prove that you have the shortest program that will generate a given sequence without brute-forcing all programs upto that length. I think the proof for that is in this paper, but I need to reread that as I haven't looked at it for ages

So, given the set of all sequences of length n some will be type A and some of those will be type B random (both of those lines are fuzzy, but I don't think that right now matters). Assuming you can find the shortest program that generates each of those sequences you can reinterpret each of those programs as numbers and give each member of the set a number (and thus an order). Some sequences will have shorter numbers than others and if we generated everyu possible program in length order we would expect to see outputs in roughly the same order as our ordered set.

Now, this is where Wolfram's sequences come in. Mathamatica's random function actually runs a rule 30 CA and uses the values of the center line as random data. This gives type A randomness (so he says). However, what he's calling complexity in ANKOS is type A randomness and `his' (pompous twit) `complexity generating' CAs are simply low numbered members of the set of sequences.

Like I said, that was just me writing stuff down to see what I'm thinking

Back to our regular schedule

I wonder if we can expect some really fat, high profile, Americans to be defined as enemy combatants and detained without trail now?

Another just links day, t...

Another just links day, though I'm kindof free of exams now

Oh look, linking to this site without permission is prohibited. Oh well - kiss my arse. Wired article (which links to NPR, so is that link banned too?)

Links today today because...

Links today today because I'm feeling a little shot (and it's only going to get worse). I have some ideas about Wolfram/cryptographic randomness/chatin randomness which are starting to come into focus but I'm not sure if there's actually anything there yet

I had to write about Creationism today in GS exam (that page could have been useful yesterday - oh well). My hand hurts now after 3 hours of writing. I'm a maths person, I don't do lots of writing! (On the upside I found that I have pretty funky capital letters)

We Win!

We Win! [and the Guardian]

This looks like it might be interesting (read the press release)

Good text on refuting Creationism crap from SciAm. SciAm was one of the offenders who broke links when I checked, but given the URL of this page I'm hoping it's a little more permanent.

Been checking the old arc...

Been checking the old archive pages with the W3C link checker. Not too bad, thou it seems that Dilbert and SciAm break their links after a while. It also tells you when you've missed a / on the end of a URL which saves a redirect at least.

The RIPX delay made it on the number two slot on PM (the news show on Radio 4, if you're not British don't worry). That's coverage.

Someone emailed me about Xchat code, which I haven't touched in ages. Seems someone was a little `feature optimistic' with the documentation and so I wrote a patch to bring the code up to what the documented behaviour was. It's weird reading code that I wrote years ago, but it actually made a lot of sense. Reminded me of how much C sucks for a lot of stuff, but at least the code wasn't as bad as I thought it was going to be, considering I must have been about 15 then.

Well, there's a Backlink ...

Well, there's a Backlink of the Day bar now. It's not really `of the day' as such - I grep them out of the logs by hand but it will do for the moment. Coincidentally, Keith is talking about implimenting backlinks as well today.

The two Google links above show one of the problems at the moment. In one case Google is linking to the text of Fear and Loathing which is fine because that's on a static page. However, the other is linking to a blog entry on the front page which long ago fell off the bottom.

In the Google case you can view the cached version of the IV frontpage, but in the general case the Referer header generated from people clicking links of the frontpage will be http://imperialviolet.org, not http://imperialviolet.org/page4.html#e72. We need some way to tell the browser what the source of a link should be.

Unfortunately, we cannot just add tags or attributes to existing tags without all the validators going nuts. That either leaves trying to get a change through the W3C (ye gods!), overloading some existing operator or hiding it in comments. None of those solutions are much good - anyone have any better ones?

Guess who forgot up actua...

Guess who forgot up actually upload the Syncthread paper. Opps. (Well done to all the people who emailed me about that - that would, erm, be no one despite all the attempts to get it in the logs!)

Stand have announced that the debate over the RIPX order has been delayed until Monday the 24th (not Tuesday the 18th, like /. said - that's when it was going to be). Stand also have a neat little banner ad which is now floated on the top right of IV.

A New Germ Theory [via ES...

A New Germ Theory [via ESR] about how many conditions (such as cancer, heart disease and even homosexuality) are infectous because, if they were genetic, they would be selected against so strongly as to disappear. Well worth the (quite long) read.

What I said this morning (it's just below this - read it first) is backwards. We don't need a way of marking sections as "this stuff is actually someplace else", we need inlink linking (like <img> tags put content inline). This is nothing new - Ted Nelson told us all this years ago.

Very cool CSS [via Keith]...

Very cool CSS [via Keith]. This is pretty much a mozilla only zone, konqueror doesn't stand up to much here, but there are some mangled versions of the pages for IE users. It would be cool to use some of the this kind of CSS on IV - but it would cripple it for most viewers. Having the tree structure as a CSS menu would be so funky I might just do it anyway.

With an eye to some backlinking on IV I tried to find a simple webserver that would log Referrer headers. Apache is far too big to use unless you really have a need for it (PHP etc), thttpd looks just right, but as soon as you switch to logging to a file it doesn't log referrers! Aggh!

In the end it only took about 5 minutes to hackup the One True Webserver to log referrers anyway, so that's what I'm running now

J for C Programers

Four entries today and counting. Partly I'm catching up from when IV was down this week but mostly it's because if I start coding anything I know that will be the afternoon gone and I really should revise Biology

This was posted to LtU this week. I've read about 1/4 of the way through, which covers all the basic ideas in J and the builtin operators.

It's a very interresting language and quite cleanly based around the idea of multi-dimentional arrays for everything. However, like most pure languages its range of applications it quite limited and it also suffers from being in the Perl family of function naming. In fact it's worse than perl in that reguard, J is more like K.

However, I do recommend that people read the beginning of the book and grok the concepts of the language about implicit loops.

More on ANKOS

Keith links to a discussion on A New Kind of Science. Most of the opinions there seem to follow my own quite closly.

For you Mozilla users the...

For you Mozilla users there is now a CSS tree at the top of the root page (in addition to the old-style tree at the bottom). It's a little buggy, sometimes it doesn't seem to fold away right and sometimes moving down to open a different branch changes the tree so that you select something else. It's a start thou

Later On: Removed it - just looks silly in a browser which doesn't support it

Asynchronous Behaviour

Raph and Dave McCusker have been talking a lot about how to design servers recently.

I played around with these ideas a lot during Whiterose development and my views might be a little clouded because Whiterose isn't a typical network server (the Freenet protocol is far more complex than most). During Whiterose development I settled on syncthreading as a good model for managing state. This text gives a good overview of syncthreading (by my definition). It was written as a technical report for Uprizer, but since it says nothing about Uprizer specifically I doubt they have a problem with my making it public

Now, the SEDA people have a good paper with some neat graphs of throughput with threaded and event based models which certainly suggest that no one should consider writing threaded servers if they want it to scale.

However, threaded systems (certainly syncthreaded ones) are event based. They're fundamentally the same thing.

With an event system you loop over your input queue call (be it select, poll, kqueue etc) and, for each event, feed the event into the finite state machine (FSM) for that transaction. The FSM makes some quick, non-blocking, change in its state and returns. Repeat.

The only difference with a threaded model is that the FSM is a stack and some registers. You still feed the event into the FSM and it still returns quickly. If you have syncthreads you are doing this explicitly, otherwise the kernel is doing it for you. (if you don't get what I'm saying here, read the Syncthreading paper and come back).

In an ideal world the FSMs would be fully self contained and so it wouldn't matter if they were updated at the same time (e.g. preemptive threads). Sometimes, however, they have to interact because the protocol requires it. In this case there is a fundamental difference between preemptive threads and most event-based designs.

So, given that one caveat, if you are saying that threaded systems are slower, or have less throughput etc you are really complaining about the quality of your threading system.

Why I support syncthreading

Given that I've just said that syncthreading and event-models are, basically, the same - how can I say I support syncthreading?

Simply because the FSMs are so much nicer to code as threads when they are anything more than the most basic systems. Threading basically loads and saves your state for you (while event models make you update structures manually) and threading lets you have state encoded in the program counter. A linear progression of states is written as a linear progression of statements and loops of states are written as loops in the code. It's just so much easier to work with.

Whee! Imperialviolet.org ...

Whee! Imperialviolet.org lives! Xia went down and looks dead - and it unfortunately hosted the DNS for imperialviolet.org. Much thanks go to Schvin for DNSage from now on. I still have a slight hope of getting netsol to recognise metis as a nameserver thou.

Time has been all taken up by exams, so not much to tell from looking at my bookmarks (which are spanning both monitors again - sure sign that it's time to organise them).

Oh, and the letter I sent to my MP

<AccordionGuy2> "Man in the Middle" -- protocol porn.
<GabeW> leave it to #infoanarchy to turn crypto kinky

When the RIP act was passed...

Oh fuck. When the RIP act was passed dear Mr Straw bleated repeatedly about how careful the control was going to be and how tight the safeguards would be for spying on people. Now what do we have?

"expanded to include seven Whitehall departments, every local authority in the country, NHS bodies in Scotland and Northern Ireland, and 11 other public bodies ranging from the postal services commission to the food standards agency."

The Food Standards Agency?? And this comes as a government aide has to apologise for investigating a member of the Paddington Survivors group because they might be against government policy.

As a first step, your goal, dear reader, is to try and get TLS installed on your MTA today. It's only a small step, but it's pretty simple (qmail, Debian Exim, Exim, Postfix).

Send an email to me (via TLS/SMTP of course) and get your name on a roll call of fame. Guess who's MTA has been TLS enabled for ages?

E7l3 discusses A New Kind...

E7l3 discusses A New Kind of Science today. As I'm a little further into it it has got a lot better - mostly because Wolfram has stopped talking about how wonderful he is.

Lots of new stuff on LtU. When it rains, it pours (certainly true of the weather today)

Seattle Times article [via Bram] talking about how `public domain' means almost nothing in US legal terms and a little about BitTorrent.

Old Story about Australia abusing spying powers. More reason why people should be very worried about what Europol wants to spy on (mentioned here before).

Like I said, a lot of projects are long term solutions to this and, in thinking about it, I must admit that I keep designing some manic things. (I'm not ready to throw them out yet). As a start I'm looking at overlay networks and the performance of TCP over different connections with a view to creating a Crowds like network. While I'm at it, it would be nice if the overlay network could make NATed and firewalled hosts first-class peers.

In order to play about with TCP I've setup a quick program that uses the tun/tap device to simulate a link with a given bandwidth, latency and packet loss. I might post the results here if there's anything worth noting.

However, exams are taking first place in my scheduling now

On Computers and the Evil...

On Computers and the Evils Of Powersaving

Went into school yesterday to startup everything and to fit a UPS on metis. Unfortunately I didn't compile the kernel with serial support (minimal and all that) so I'll have to reboot metis on Monday with a newer kernel. Good excuse to switch to 2.4.18 thou.

I also took marvin (the laptop) in with me and let it feed off the 2Mbps connection. It's now a fully up to date Debian unstable with 2.4.18 etc. I also finally got both my PCMCIA cards working. The 3Com network card was fine under 2.2.17, but the Aironet didn't work at all.

In comes 2.4 and both of them break. It turns out that the 3c575 driver has been replaced with 3c59x which has hotplug support (meaning it can handle the PCMCIA card without cardmgr or any of the rest of the pcmcia-cs tools). The airo drivers, however, do need pcmcia-cs and only the latest will do. Anyway - it's all working now.

Since I have an Aironet but no other 802.11b devices I set ethereal sniffing wifi0 and walked into town to pickup a book (more on that later). The Aironet is, as I understand, the best card for sniffing since it can do RFMON (multiple frequency sniffing I think). But the laptop went into power saving (which I expected) and powered down the PCMCIA bus too (which I didn't), so I got nil packets. Oh well.

A New Kind of Science

The book I was picking up was A New Kind of Science, which has finally reached this country. And thanks to Anna Gregson who came up to me in town and just gave me 30 pounds, thus neatly paying for nearly all of the book. I wish more stunning ladies would do stuff like that

I've only read the first couple of chapters but so far Wolfram is being a pompous twit. He spends a lot of words talking about how he is the first person to recognise that complexity can come from simple structures and how he is the first person to study it.

Doesn't exactly ring true does it?

I think, Stephen, that most people has realised that long ago. Certainly people who had looked at crypto. I mean, it's spelt out word for word in Cryptonomicon (the chapter about LPW and organ). LFSRs were the mainstay of military crypto for years and were very well studied. Maybe I'll think more highly of this book as I get more into it.

The Beauty of Fonts

This [via Raph] is a good discussion of why so much has been said about unhinted AA recently. Just look at Chimera - that's nice. Unfortunately, since I run Xinerama I can't have XRENDER as well, at least in 4.1.0. Maybe 4.2.0 will fix it, but Debian doesn't have it yet (thou I note that Gentoo does).

(as an aside, Freetype have a cool new, fontish, logo)

CSS

As a rule of thumb - if it isn't broken in konqueror then I don't know about it. Pill pointed out that Mozilla (and other Gecko based browsers) didn't like the CSS because publicfile was returning the MIME type as text/plain. Fixed that in a very DJB way, the name of the CSS file is now iv.css.text=css.

Well, IV is still down as...

Well, IV is still down as I write this, but I'm bringing it up tomorrow and taking my laptop to it's 2Mbps connection to download some stuff. Upgrading Debian for one (so I can drive my Aironet card) and I'll try to download this. It's a 143MB (5 hour) MP3 of someone called Zigmund Void. The first bit sounds weird (I wondered if the file was corrupt) but nym ensures me it's quite good.

Although a perfectly good day, I'm a little down. Firstly Zooko's hard drive died and took many unfinished papers and films of Irby with it. In the future I guess he'll backup, but there's something heart tearing about data loss like that. Maybe it's a sign of geekyness that I get upset by that. He's saving up to send it to DriveSavers but they cost lots. (you can always tell that if they don't have prices on the website then the cost is too high).

And for those unconcerned about the loss of millions of poor bits, then this made things worse. It's a record of what Europol wants to log given their new powers which I noted here before.

We need a short term solution to this problem (other than cutting these people up into lots of little pieces). Freenet/MNet and other such networks are long term and systems like Mixmaster are too difficult for most people. I don't know how much confidence I have given that most people seemingly couldn't care less but maybe something like Crowds (but with IRC and email etc too) would be ok.

I guess I'm more than a little depressed about human nature too. I have some thoughts about event-models and the like, but I don't feel like writing them up now

Imperialviolet will be do...

Imperialviolet will be down from sometime like 3AM BST tomorrow until BST morning Friday

It seems that

When I subscribe to mailings lists I use a lists.something@imperialviolet.org address so that if that email address ever gets used for spam I can subcribe under another address and direct all mail to the old address into /dev/null

Unfortunately, this interacts badly with mailing lists which (resonably) require that the sender be a list subscriber. Good old mutt has send_hook commands in its config file which change the From address quite nicely.

That doesn't, however, change the envelope sender, but a quick look at the qmail-inject manpages shows that setting the Return-Path header will do that fine. Unfortunately, mutt strips Return-Path headers before sending (set one in a message, send it and then look at it in your outbox - it's gone). After grepping the mutt source and sending many test messages, deltab finially sorted it out - put set envelope_from in your .muttrc and all is solved. Thanks to all the #infoanarchy people who helped. (I'm just blogging this to help others and because I know I'll need it again at some point in the future)

Still reading Structure and Interpretation of Computer Programs and have just finished Garden of Rama (AC Clarke and Gentry Lee). It's pretty cool scifi - not hard scifi nor stunning - but a good read.

Software Fault Prevention by Language Choice...

Software Fault Prevention by Language Choice: Why C is Not my Favourite Language [via DNM]

The following is a result of an IRC conversation with Zooko and tvoj and may well only make sense if one was a part of that conversation

This article (from the E homepage) suggests that capability systems cannot stop communicating conspirators. I think that either the author of that (MarkM?) or myself are misunderstanding something here. Capabilities can stop communicating conspirators (I'll show that in a sec) - but part of the situation definition in the article is "Bob may also communicate with Mallet" - which is a different problem. The article is, I think, talking about the following situation:

A has a capability cap and can communicate with B. A and B wish for B to be able to use cap but A cannot transfer cap to B

In that situation A can proxy requests for B and in that way B can use cap

Now, there is a separate discussion about a different situation. I think we were confusing the two last night. The situation this time seems to me to be:

A has access to secret information and wishes to communicate/leak this to B. The capability system seeks to prevent this

Now I think that a capability system can ensure this. Proof by induction: Suppose for a moment that A and B have no capabilities. Thus they cannot do anything at all and so cannot communicate. If you only give them access to objects that do not allow communication then neither can communicate.

The problem is that the number of objects which do not allow communication is small and it's hard to find them. (I should mention at this point that this is a problem that Multics tried to solve - interested parties should Google for some papers from that).

If we give A and B access to integers and simple operations (add, multiply, divide) then they can perform simple tasks and still not communicate - even if we give them capabilities to the same integer objects. Integers are environment free - they have no state. An object which is environment free is also side-effect free and deterministic (the result only depends on the inputs). You can also think as side-effect free as does not write to to environment and deterministic as does not read the environment. In the following I mean deterministic (D) to mean mearly deterministic, the same for side-effect free (SEF)

Of course such properties only hold if the inputs are at least as strong. For example if you pass a deterministic object to an environment free function, then that function may set some global variable through the input.

To stop A and B communicating you must stop their environments from touching. You can give them EF objects with no problem. You can give them EF and SEF objects with no problem, same for EF and D objects. However, as soon as they have SEF and D objects you may have a problem as the D objects could write to something which the SEF objects could read.

Now, if A and B are running on the same computer then they share objects like the CPU. Give B a clock and they can communicate via the lengths of time-slices etc. Give them the ability to malloc and they may be able to communicate via the VM etc. This is why it's such a hard problem.

I got Amphetadesk working...

I got Amphetadesk working by sending the Perl libs it couldn't find to the apt bot on #debian and installing the packages it said. Only worked for 0.91 thou. At least I know that my RSS feed works.

TBL (not, Tim Berners Lee, another one - who is, I think, the smartest person I've ever met) recommended The Structure and Interpretation of Computer Programs a while back. Well, I found it online (legally) today, via this page. Reading it on my laptop now.

This is very sweet

Android Generated for Gratification and Logical Exploration

ESR has a weblog! (and it's pretty good). A bit gun mad, but that's always been a problem with Eric

Suggested reading: Notes ...

Suggested reading: Notes from the BPDG (fuckware) meeting, BBC article about the new EUP rules on data protection and privacy, Another BBC article about the large fall in street crime in Lambeth (following a relaxing of cannabis laws)

A little while ago, it was found that irssi was backdoored at the source level (someone cracked server and altered the source). Now it's been found that fragrouter suffered the same thing (search BUGTRAQ archives). Two points from this: in the short term we need a little more crypto and in the longer term we need to fix the god-awful UNIX security model.

Irssi has started signing releases at least, but personally I still can't believe that Debian doesn't. It's really not rocket-science and the code support is there (deb-sigs). The general argument is that the number of Debian developers means there are too many keys that could be compromised. Guess what? It's still better than nothing. (As an aside, Debian still doesn't have incremental updates to the Packages file - Debian sucks in too many ways).

In the long term we need to do something about the UNIX security model. The research has been done - that isn't a problem. But I'm lazy. I might quite like to play with more secure systems, but they are marked out by being unusable. Of course, I'm not working to create an EROS distribution or anything so I don't really have the right to complain. Maybe something like this [via Wes] will stay us in the short term.

This looks like a cool re...

This looks like a cool resource for optimising PHP. I haven't done any PHP for a while thou

Be sure to checkout the links from Zooko's weblog for 5/27.

Picked up my photos from the school prom today and scanned them in. My scanner is pretty rubbish. People usually look pretty bad in photos, and after I've scanned them, they look even worse . Anyway, they're here

This link was mentioned on infoanarchy on why ATM is a bad idea. I don't really know much about this, but it seems pretty persuasive. Then again, if I had so much money I could even consider an ATM network I would be pretty happy!

Dan Moniz pointed out that...

Dan Moniz pointed out that I've been a total id10t and linked to sweetcode.com, not .org, which is why it's been saying "For Demos". Stupid me.

IV also has an RSS feed now

Also, thanks to Pill for pointing out that mozilla was rendering link borders around the permalink images and for giving the CSS code to fix it. (IMG { border : 0 none; }, BTW)

IV now has permanent URLs...

IV now has permanent URLs for each entry into the archives for anyone who wants to link to a specific entry. Copy the link from the to the left of the time lines

This is getting silly - caffinated soap for a wake up in the shower. About 250 milligrams of caffeine per shower is absorbed through the skin. weird.

Projected deaths in a Kashmir nuclear war. Newscientist is broken with most browsers (it returns a "unavailable at the current time" page). Faking the User-Agent seems to do the trick

This paper has been doing the rounds (thanks to tav on #esp for the link). Kindof scary what a determined idiot could manage to do. So who's going to write a worm which uses stealth-spreading and strikes, encrypting all the hard drives of computers it infected? How much do you think you could sell the key for? (credit to Ian for that idea)

It seems that NASA is set on a manned mission to Mars after the discovery of millions of tonnes of ice just a meter under the surface.

So is NASA going to give up on the ISS and try to capture the nation's imagination (and wallets) with the first manned mission to anywhere since the Moon? The ISS has cost $40 billion so far (it should have cost $8 billion and been finished in 1992) and there's no end in sight. The Russians don't have the money, NASA is practically bankrupt after bad accounting practices were revealed in June 2001 and the other members (European, Canadian and Japanese space agencies) are fed up with the whole project.

Goldin (who ran NASA throughout the 90s) said NASA was going to be "Faster, better, cheaper" after the loss of the $1 billion Mars Observer. Wouldn't the faster, better, cheaper option be another unmanned mission? It doesn't have the glamour, but you have to question the reasons why NASA is doing this. Science or PR?. What does a manned mission offer?

BBC piece about the EU En...

BBC piece about the EU Enforcement Directive (an addition to the copyright directive). What they want this time seems to be that all CD/DVD presses put ID numbers on CDs and more powers to search.

AaronSw is pissed with the W3C

BBC News article about the water on Mars that I would have linked to had I had it around earlier

And an article about NASAs money problems (SciAm).

Essential Blogging by O'R...

Essential Blogging by O'Reilly is in draft form for download (ZIP file of PDFs). Deals with using common blogging tools and a little about blogging community. Worth a skim even if you blog with XHTML and a few custom Python scripts like me

This is quite a neat little utility for modern IDE drives. It uses the SMART interface to read the temperature sensor and spits the value back out. I didn't even know IDE drives *had* temperature sensors built in.

I'm not sure what's going on with Sweetcode. It just reads "For Demos" and has done for quite some time. Noone seems to know what Demos is (other than the plural of demo).

Two of my four SCSI drives seem to fail to startup until they've been on for a few minutes. I'm pretty sure it's not the drives (since it's two of them at exactly the same time). I'm thinking maybe I shouldn't have used a cable from the "SCSI Bucket of Bits" from work.

From Ian:Actually, this i...

From Ian:

Actually, this is my assessment too - I have been saying for a while that Quantum Computers will cause us to revert to a situation where secure practical crypto was the exclusive preserve of the wealthy and powerful (ie. those that can afford Quantum Crypto).

Perhaps you should email Mr Singh and ask his opinion of it.

I will

Another Paul Graham link

School's out for ... ever...

School's out for ... ever

Scary

Ok, I've improved the IV ...

Ok, I've improved the IV generating scripts to break up blog entries into 10 on the front page and a series of archive pages (20 entries each). You can access the archives from links at the bottom of every page.

I went to a talk by Simon Singh (author of The Code Book) on Wednesday (part of the Cheltenham Science Festival). It was a crypto talk (nothing I didn't know already really, but very well done and with a real live demo of an Enigma) during which he went through the solutions some of the 10 challenges he sets at the end of The Code Book. Nothing remarkable here except that he admitted that the toughest code to crack (RSA wrapping 3DES) was done wrong. Rather than Enc1Dec2Enc1 he did Enc1Dec1Enc2 - which is just the same as single DES. Implimentation issues again.

But the part that got me thinking was a little aside when he said that Quantum computers might cripple factoring schemes, but that's ok because we have Quantum crypto. Now I need to go lookup exactly what algorithms exist for QCs - but if we assume that it breaks all pubkey systems we know then Quantum crypto doesn't replace current crypto at all. It requires a direct fibre link in order to preserve the all important quantum states of the photons. This puts us back to the days where very few people have crypto (those who can afford direct fibre links between themselves) - a major step back.

It would be a sorry state is this happened - someone please reassure me that it wont.

From pupok:I realised rec...

From pupok:

I realised recently that people from the UK have a different sense of what things are appropriate to talk about publicly. It is extremely well-defined. I do not possess it at all,

Really? I don't even notice it. I guess IV is very non-personal and I would feel awkward about posting some things here. But there is a little of that good old Bristish stiff-upper-lip around - but very little

I'm not sure about this site yet. Looks very complete and well designed but I haven't really had a chance to read much of the content yet. Doesn't really matter if someone pulls off a singularity

Landscape page updated. Metis was down for a while today since the IP address jumped (again!). Damm Telewest keep on telling us that it's static. They now say, however, that it really will be after maybe one more jump. Wonderful

Followup to Paul Graham's Revenge of the Nerds, here. Thanks to AccordionGuy.

<AccordionGuy> The next plan of course, if for Chancellor Grahamtime to propose creating an army of the Republic to crush the separatists.
<agl> if he does, then at least we get a cool film out of it :)
<agl> does Dennis Ritchie play Yoda then?
<rik> agl: "Yes. The C is strong with this one."
<coderman> i cast my vote for Bjarne as Count Dooku
<rik> "Edit code in vim he does. Mmm."
<AccordionGuy> a.k.a. Darth Templatus.
<AccordionGuy> Count Do-Loop.
<AccordionGuy> Who are Anakin and Padme, then?
<Loki> paul vixie and the chick at the head of RIPE, forget her name.
<AccordionGuy> If either Kemeny or Kurtz (BASIC) were still alive, they could play Jar Jar.
<AccordionGuy> I vote for Larry Wall and Guido van Rossum as C-3P0 and R2-D2.
<coderman> lol
<JDarcy> RMS as J. Random Ewok
<coderman> RMS looks like an ewok. sorta
<AccordionGuy> I was thinking of RMS as Dex, the greasy spook cook. They look like they have the same hygiene habits.
<AccordionGuy> I was thinking James Gosling as Jabba, since we already call Java "Jabba" on this channel.
<AccordionGuy> "Hoh, hoh, hoh, anonymous inner classes are my kind of scum."
<AccordionGuy> Cobol would be poor old Chancellor Valourum, who got ousted in Episode I.

Downtime: Metis (and thus...

Downtime: Metis (and thus IV) will be down from early June 5th for about 2.5 days while the language center is wired up at school. Zooko has very kindly offered to be backup MX during this time

Today was the last semi normal school day. I'm not in tomorrow, I have exams all Thursday and Friday will just be wild. Biology practical went fine this morning (even though the exam board contacted school yesterday and told them that the experiment doesn't actually work and to just give us the results). It feels very weird to be finishing school - I can't remember not going to school. Feeling displaced

I've written up the starting notes on Landscape. It's just a bitch about the importance of IDEs at the moment - not even sure if I want people reading it yet. I guess my main point about IDEs is separate at the moment and not clouded by and sketches of LS proper yet

A new addition to the Common Links: E7L3. My common links are so incomplete at the moment but it's fascinating to see social webs in action. With blogs you can actually see the arcs of similar interrest. Now I want a tool that gathers recent blog entries and cross links them by keywords. Answers on an email to agl@imperialviolet.org. (I know IV is guilty of not having an XML version - I will fix this at some point

New article from Paul Gra...

New article from Paul Graham

This site runs some pretty sweet music in MP3 format (thank god it's not all RealMedia or WinMedia)

Following a link from JWZs weblog to here which has lots of really neat stuff on fractals and the like

Joey on Attack of the Clones...

Joey on Attack of the Clones (which I watched yesterday). The love scenes are, quite frankly, pants and if I had been editing it most of them would have been left where they belong (on the cutting room floor). However there is more then enough action to make up for it and Yoda with a lightsaber is worth the ticket alone!. Go see it.

Aaron has posted an annotated ETCon programme with links to the blogs which covered each session.

Wolfram's book is out (and my order is lodged with Waterstones). Wired have some good, extensive coverage. But, unfortunately, it doesn't look like Wolfram Research Europe are getting any copies until the end of the month, so god knows when I'll get mine.

I've now taken to leaving notes for myself in root.inc (the file which generates this page) and it works pretty well

Don't you just wish you h...

Don't you just wish you had a digital camera which could take pictures like this or this?. That could almost be rendered - ouch. (bigger versions for those on better connection, play with the URL to get 1024x768 and 2272x1704).

Frost looks like a very interesting extension to C++ (it's not general as far as I know, only for G++). It allows multimethod dispatch with syntax like:

void func (virtual wibble & arg) { do_something_to_wibble_objects (arg);}

Just great. Go read it.

Some good procmail rules for filtering spam out (works really well for me)

:0
* Subject: ADV:.*
spam

:0
* Subject: .*\.\.\.\.\.\.\.\.\.\.*
spam

:0
* Content-Type: .*charset="ks_c
spam

:0
* Content-Type: .*EUC-KR
spam

Ah, at last. Maybe something...

Ah, at last. Maybe something which can explain what the hell Aspect Programming is (I expect I'll be disappointed).

GCC 3.1 is out (changelog). And they have a MMIX backend! Woohoo, now all I need is a JIT compiler to translate MMIX into IA32 to run it

An article on arch (the CVS replacement)

About 6 people have put their keys through the verifier and it seems to be working pretty well now that I've hacked the GPG source to stop it asking questions. GPG is a total bitch to interface with - we need 1.1 with the g10 library.

Well, I now have 4 of the...

Well, I now have 4 of the 5 aforementioned SCSI drives in a RAID 0 array (I can't fit all the drives in the case). Nice to see my PSU didn't melt and it's pretty sweet speed-wise. You mke2fs and the count shoots up to about 120 right away. Then all the drive LEDs light up and the room shakes for couple of seconds before the mke2fs finishes

Another Lisp Love story

Five Little Languages and How They Grew

NYTimes story about how a company is buying out a whole town (for 20 million dollars) in order to stop the people sueing them over pollution. (registration required for NYT - try user:strsnyt, Password:strsnyt). You gotta love the American disregard for the enviroment. Five percent of the world's population and 25 percent of the pollution.

New articles on SciAm

New project time! A big w...

New project time! A big welcome to the key verifier. As much as we would like the web of trust of GPG keys to work, most people don't know enough people who use GPG for it to work. So as a little helper (and a pretty small one at that) the key verifier will sign your key once you've proved you control the email address in the UID. Thanks to AaronSw and Ian for testing it out.

Introduction to the Semantic Web. It would be nice if this works. At least the SW people realise that they are never going to create a pretty system if it's going to be worldwide, I only worry that because they're aiming pretty low, they are going to hit far too low. Time will tell.

JWZ has a new weblog. For those who don't know, JWZ is famous for lots of things: writing xscreensaver, working at Netscape/Mozilla for a while, owning the DNA lounge and (most of all) for ranting

USENIX2001 report. Ok, so it's from last year - doesn't mean it's not interesting thou! (from Lambda).

I was looking about for places to take me on for the summer yesterday. (It's stunning how disinterested people suddenly become when you mention internship). On the top of my (remaining) list was a company called Innovative Software which I was going to call today. And guess what the headline on the local paper is?? Microsoft sues Chelt Computer Company. I guess there's a chance they might be looking for a Linux knowing intern now - but more likely they are just going to get swatted.

Matrox's new card is out. Drooooool.

One virus has infected an...

One virus has infected another. Namely CIH and Klez to make a fast spreading, nasty payload virus.

Befunge is a language with a 2D `memory' and the IP advanced in 1 of four directions. Crumbs

Nice is the nicest (no pun) Java based language I've seen in a long time. It uses an ML type system (pretty much the best one around) and compiles to Java byte code. Very, very sweet. Need to play with this a bit.

Debian finally has GNUPG 1.0.7! All apt-get

Details of the new Matrox card are surfacing (also see /.). I still very happy with my G400. In other aggle hardware news, I've not got an SLI MegaRAID SCSI controller and 5 9GB UW3SCSI drives to make a nice RAID array with. I just need a SCSI cable with enough connectors and a power supply which won't melt now.

A video of a Lisp Machine in action. I'll have to download this at school as it's a little big. Lisp machines were waaay cool thou.

Imperialviolet's DNS has ...

Imperialviolet's DNS has been upset for the last couple of days as netsol get off their fat arses and update stuff. DNS sucks. I might design a replacement.

I'll catch up tomorrow evening with posting stuff

GZigZag has renamed itsel...

GZigZag has renamed itself to GZZ and moved to Savannah (from SourceForge). GZZ is a free, java implementation of Tel Nelson's ZigZag. It's a neat idea and something like it plays a large part in my plans for total world domination.

Ian has an article on CNet about Freenet and Sept 11th. It's worth a read, or at least a skim

The Mouse Genome is now online if the like that sort of thing. And (better yet) it looks like it's free (in both speech and beer). BBC article

Maybe I should see a doctor about my nosebleeds. They don't happen often, but when they do - boy does it bleed. At least I didn't pass out this time.

Metis now has a TLS (the protocol formally known as SSL) enabled MTA

Roger Dingledine has rele...

Roger Dingledine has released the first draft of the Mixminion design paper. Remix is pretty much dead as the only reason I started writing it was to try my hand at RSA coding - so go read it and pass comment. Maybe when exams are over I'll help with Mixminion because, let's face it, Mixmaster is getting a little long in the tooth.

IV now validates as XHTML...

IV now validates as XHTML 1.0 Strict! (well, the front page at least). The W3C validator sometimes needs a few refreshes to connect to IV thou.

Wes has a link to this interesting page on XHTML+MathML+SVG. Maybe this could actually be a decent typesetting system.

Just finished Diaspora by...

Just finished Diaspora by Greg Egan - wow. The scope of this book is stunning as is the authors grasp of maths and physics. How many other authors explain all the reasoning behind their ideas?.

After about 4 weeks tryin...

After about 4 weeks trying, Smiths have given up and said that the new True Names book is out of print, despite that fact that it was published in December 2001! This is a sucky country to get books in

Some links:

If anyone knows Aspect Orientated Programming I want them to email me an example of why I would want to use it. (you know, like the example of GUI's which is always used for OOP).

Stackless Python (whichI'...

Stackless Python (which I'm sure I've mentioned here before) is moving towards a Limbo [1, 2, 3] model of microthreads.

Maybe an interesting book: IA-64 Linux Kernel

Wolfram's book of about 10 years is coming out soon (May 14 so says the website - but that might slip).

Kotako is being dehosted ...

Kotako is being dehosted as of June 1st after years of fine service (kotako runs linuxpower.org). So I'm moving my mail to imperialviolet.org - which involves re-subscribing to far too many mailing lists - but I'm getting there. Maybe I can get linuxpower.org's MX pointed to mail.imperialviolet.org so I can run a forwarding service for a while.

This was posted to Slashdot - but I don't care I'm going to post it here too because it's a really great list of major software bugs. It's nearly all `physical effect' bugs thou (for example it doesn't count things like the Ping Of Death). Funny reading - if a little scary

I think I've got a decent grip on Garbage Collectors now and damm, it's a nasty problem. I have a few design ideas which I should write up at some point.

The Bloody Sunday Inquiry has demanded that two Channel 4 journalists reveal anonymous sources, on pain of contempt of count. Quite frankly I don't see any way that you couldn't hold such a court in anything but contempt. If the sources had not been granted anonymity then they would never have said anything - the inquiry should be grateful for what they have. Forcing disclosure will only damage future inquiries and alienate people against this one. At least the two reporters have refused for the moment.( Guardian story with some great quotes from the reporters).

Another good story (and another link to the Guardian). I wonder if you could retrofit a computer to a car (a proper computer) and live your whole life in a car? Drive-thru everything, wireless internet access, a postal box to get your internet ordered stuff delivered (it would have to be drive thou of course). You would be the totally mobile citizen, telecommuting and living out of your car. The only problem I can't work round (reasonably) is toilet stops. But I suppose you could have a toilet fitted and store the waste somewhere in the car until you find a place to, erm, dispose of it.

I'm sure people have seen Jamie Kellner's `views' on PVRs. If not - see the Slashdot story. Comments about the stupidity of this guy aside (and I loved Ian's Tivo) it seems to me this is an example of the Tragedy of the Commons. As long as people watch adverts on TV (actually, for as long as marketing people think they do) - everyone gets free content. However, some people can skip the adverts (either by not watching or by using PVRs) and get free content without suffering the adverts. And let's face it - nearly all adverts are just painful. This works until too many people do it and marketing people realise that ads don't work and so people have to pay for content.

Now, I really don't want to be seen to be agreeing that "PVR Users Are Thieves" (because I'm not) - but it looks like the current system is fundamentally unstable and will fall at some point. Now the thing not to do is to try to pin reality somewhere it doesn't want to be with laws and fuckware which force people to watch adverts (HDTV and the like). One point where the link with the T of the C fails is that in true T of the C situations everyone is worse off afterwards. I don't see that this is true in this case. I wouldn't have a problem paying for what little TV I watch if it were advert free (assuming a decent micropayments system).

But then people will be trading DivX's of shows and they'll still be bleating about users being thieves. And you can bet they'll still be trying to pin reality with laws and fuckware. Sigh.

Paradigms of AI Programming

Woody not going to make release tomorrow. (but who doesn't track unstable anyway??)

A Retrospective on Paradigms of AI Programming has been updated

I'm reading up on Garbage Collection at the moment. Microsoft's .NET GC implements the write barrier my watching for dirty pages - but that gives 4K granularity, which is pretty poor. But the thought of a write barrier in the code at every pointer write isn't exactly nice either.

More on the EUCD (DMCA for the EU). Does anyone else feel political jetlag? I thought you had to be older for that to set in. Maybe we should abolish politicians (I don't think I actually agree with that page thou).

And here's today's (badly spelt and lacking grammar) message from our sponsors:

it's official Linux is pant s cause it doesn't support all of the messenger icons - go out and buy up xp all u linuxites" - will merrett CEO of willmerrett corporation the biggest company worldwide by 2020 and runnin on windows!

Google has launched a new...

Google has launched a new service in beta (requires HTTPS)

Currently nostalgia tripping on New Order - Shellshock and Screenwriter's Blues (finally found MP3s of them, lyrics for SB at the bottom of the page)

CERT working with a new language for doing simulations of complex systems [Link has broken since then]

And today's random link is this

Well, I take it back abou...

Well, I take it back about Kompessor after Zooko pointed me to some of her better works (namely Attack and Release and Never Talk to Strangers).

I read this in the latest issue of New Scientist:

The depression had hit, and the town had thousands out of work and little money in the municipal coffers. So the mayor printed his own. The value of the Worgl "stamp scrip" was set to automatically depreciate: this is, it earned negative interest.Once a month, its holders had to pay a "stamp" fee of 1 per cent of the value of the note. The result was that everyone spend the new money in the town as fast as possible. The streets were re-paved, the water system rebuilt, new houses appeared, then a ski jump, a new bridge. Some 200 other Austrian towns came up with plans to copy it, the central bank panicked, and it became a criminal offence to issue currency.

That's pretty stunning. I always wondered what the basis of the fiscal system was - thinking it must be pretty stunningly complex and well thought out. I'm pretty much of the opinion that it's not thought out at all and just works because of luck and is designed by powerful conservative, neophobic organisations.

Some other links which mention this story:

I'm sure you can all Google for other links as well as the next person

My book of the moment is Advanced Compiler Design and Implementation (1558603204) (Table of Contents). It's a real blood stopper when you open it on your lap, but it's a really good book (from what little I've read) as this stuff is pretty difficult to digup off the net. More when I've finished it.

Looking to build some kiosk boxes (Internet access only) at school so I set a P75 (16MB of memory) building Gentoo. 20 hours later is had only got to building gcc (the first time) so I scrapped that idea and installed Gentoo chrooted on metis (Gentoo is cool in that you can do that sort of thing) and copied it over.

Ouch is it slow loading galeon! I'm talking 5-10 minutes to load and display a web page. And that's after having to run X on the frame buffer because it can't seem to drive the Mach64 card. I'm going to have to look at a lighter weight solution. (maybe netscape is smaller - I'm sure the memory is the major problem).

This is today's really random link. Quite a few broken links on the page - but still lots of (maybe) interesting stuff.

Welcome to the new, NS4 f...

Welcome to the new, NS4 friendly, Imperialviolet. This is where agl stops playing web designer and goes for something simpler. Bitch away if you will. I need to improve the scripts which generate IV to handle these blog entries so I can limit the number of entries per page.

Zooko's blog links to a song called Rappers We Crush by Kompressor. I can't say it's at all to my liking - but the girl on the Kompressor page is really cute .

This link is Kuro5hinated at the moment - but I want a go once it's up again.

Kali is a Scheme which can move closures and continuations across computers. I've yet to read their paper (it's currently top of list thou), but it looks sweet

I'm off to pickup a nice big book on compiler design from Smiths tomorrow

Kotako was down for a cou...

Kotako was down for a couple of days due to a power outage, so I'm guessing lots of email bounced. I'm sure people will resend it.

An apt-get on metis foobared it as upgrading caused /bin/zsh to disappear, thus needing a power cycle and a init=/bin/sh since zsh was root's shell. The IP address also changed on the new boot so it will take a little time for the DNS to shake out (more so since the DNS server for imperialviolet.org just died).

The RIAA (boo! hiss!) has published a paper on file sharing networks. This one is lacking the torrent of crap which is the usual mark of output from these sort of organisations. At 75 pages of information I mostly already know I'm not going to read it all, however it does mention Freenet lots. Some choice quotes (mostly they are pretty nice to us!)

As of this writing,the Freenet community has yet to release a usable Windows client and demonstrate its real-world scalability.

Ok, so that's the only bad quote about Freenet that I could find (and even that's pretty fair)

My Programming Language Crisis (not mine thou). Interesting reading, even if I cannot agree with his placing Ocaml first (if only because of that syntax). But Python comes second

Andy Oram on semantic webs

I'm using SpamAssassin at the moment and it's doing a really good job at filtering spam (with little messages about why a given mail was filtered and things). However, since it's written in Perl, I'm wondering if I could manually delete the spam far faster than it can. In fact I'm pretty sure I could. Hmm

I've promised to stop playing web designer since IV renders really badly in Netscape4. Maybe if I get time I'll fix it

China's home web use soars

New KernelTraffic out (#1...

New Kernel Traffic out (#163)

My Gentoo install at school is going pretty well (certainly a lot faster than the install at home over my 56K dialup link). A few niggles about Gentoo:

In light of the last point I should remember Ctrl-SysRq-K is the one to kill the current terminal's processes. It's called SAK thou, which is why I missed it today (System Attention Key).

IV validates again as HTML and CSS. It's only HTML 4.01 Transitional thou. Maybe I should try and make it XHTML Strict or something.

Lisp kicking arse here and here (second link is NYT)

Another BIO link (OMG I look an idiot in that photo!)

**** Kotako is down (mail to agl@imperialviolet.org should still work thou) ****

Yesterday I must have got...

Yesterday I must have got about 4 hours sleep on the coach to and from London, and about 9 hours sleep last night and I still feel like I could curl up and sleep some more!

Mailed zooko about his backup MXes being broken

Mother's birthday today. Got her a huge box of chocolates and some smelly stuff (I'm awful at buying presents, but the rule if (female) { buy (smell_stuff); } seems to work pretty well)

Been reading the The Qmail Handbook. I've been using qmail for ages on all my boxes, but this is a really useful book. With an animal on the front, this would be an O'Reilly book

Another link for Aspect OP, which I've mentioned before

Guardian article on a ray of light in S.Africa's AIDS policy

If a google search returns 0 results and there's a spelling correction then google now automatically tries the corrected search

Another self link to prin...

Another self link to print off tomorrow: A Case for Automatic Run-Time Code Optimisation

Took the image out-of-lin...

Took the image out-of-line since the load times were too long

Firstly a better link to ...

Firstly a better link to the IOI2002 site which I linked to yesterday (this one's in English at least)

Maths on the Web: a link from IRC might be interesting to some people. Personally I think HTML is a massive pile of crap which is only just rescued by strict HTML 4.01 and CSS. Mozilla is implementing Math-ML in current versions but there are still many browsers for which this site could be useful. Then again you could just do the right thing and use PDF (without Type3 fonts thou!)

Lisp Magazine. Lisp is cool. Nuff said.

An old, but interesting paper on why people are violent.

Goo is a YetAnotherLanguageGoingNowhere, but at least I'm interested in this one. It's an S-expression based language (as all languages should be) which calls gcc live to do incremental compilation. Clean, but not the head-in-the-sand clean like Scheme.

Just a link to myself really since I want to print this off tomorrow (it's a paper linked from the GOO site anyway) Adaptive Optimization For Self: Reconciling High Performance With Exploratory Programming

As an end note I'd just like to say I like trains. Despite the battering that the UK rail network gets in the press I've managed to go from Warwick to Cheltenham, Cheltenham to Cambridge and back without a hitch in the last couple of weeks. For one of those trips I was even travelling on a Sunday (when track repairs and the like are done).

Well, I made the internat...

Well, I made the international team and I'm off to the world finals in South Korea. Maybe I should have a stab at learning Korean. I also got a copy of Introduction to Algorithms as a prize (the one with the hanging red things coauthored by Rivest) so I'm reading that.

I also talked to a really smart guy called TBL (who works at Lionhead) and I'm pondering some really nutty debuggering ideas now.

I should write a review of Bruce Sterling's Distraction, but I can't be bothered to write too much, so here goes with a reviewette: It's a good book, but I didn't ever feel like I cared about the characters enough to really want to read on. I could have dropped this book half way through and not batted an eye. Despite this (I should think being trapped on a train for hours and hours helped) I finished it, and it's still good by the end. Nothing stunning, but good.

Well, I'm off to BIO tomo...

Well, I'm off to BIO tomorrow so no IV updates for another weekend (but then there haven't been any since Monday anyway)

Let us look at my bookmarks over the last couple of days then

First there is Mono, the free .NET implementation. Quite a lot of smart people at the ACCU were saying that .NET has some good ideas in it (a first for Microsoft) and I was thinking about building a neat little garbage collector for it. Unfortunately it needs MS's C# compiler to build - so that idea is fscked. (with a little kernel help the GC could have been good too, oh well).

I've also been looking at Gentoo. This is a build-everything-from-src distrib it's (sortof) reviewed here on /.. Since it downloads lots (the ISO is only 16MB) you need a good internet pipe and my 56K isn't going to cut it. Thus I've been trying to install it at school (cable modem).

The first point is that GNU parted is good for resizing FAT partitions and the second is that my victim box doesn't have a CDROM (and I don't have a burner either). So in the tradition of playing with anything that looks fun (cough! Smile!) I loopback mounted the ISO and netbooted a box to run the ISO via NFS. It actually seemed to work a little. Maybe more when I get back to it Tues.

Next down in my bookmarks is Stackless Python. Now, I've always that that Python's dynamic naming was such a sucky idea (and pychecker is too much of a pain to use), but the rest of Python makes up for it. At the UK Python conference I said that Python was just becoming Lisp (w/o S-expressions) and stackless makes it more so. Basically they add continuations support and the like (Python generators are a crippled version of this).

I've also been looking at Aspect orientated programming, but haven't grokked it enough to say anything really. Looks like it might be pretty cool thou.

I've been looking at the ...

I've been looking at the Lambda library (can't find URL right now, try google). This is a pretty stunning template library for C++, an example:

for_each (l.begin(), l.end(), cout << free1 << endl);

Go read that again if you want, I'll wait. Yes, you do see code inserted in the middle of a function call - and it works! I thought Modern C++ Design was pretty stunning, but this is just wow! Now I know that Lisp does this with its eyes closed, but C++ never has been a functional language and it's a testament to templates that they can be used to do this kind of stuff.

If you don't read Dilbert, why not? Smile! [looks like Dilbert links don't stick around]

I'm back! And damm, that ...

I'm back! And damm, that conference was good. Now, I don't have great experience with conferences (O'Reilly P2P1 last year and ACCU 2002 so far). But during every slot there was something I wanted to go to, and usually 2 or 3 things. The speaker list could be read off a list of the great books on C++ and there were no managers or reporters.

My presentation went great (it was my first time presenting). Everyone said it was very impressive (admittedly, they weren't going to say it was crap to me, but they could have said nothing at all) and I had more than enough to fill the 90 minutes. In fact, if there was one thing wrong with it, it was that it was too long.

School tomorrow, sigh

I'm off to present at the...

I'm off to present at the ACCU conference. Wish me luck about 11am BST on Sat.

Because of that I'm going to be out of contact until at least Sunday

So WeHaveTheWayOut comes up running OpenBSD and after the media storm switches to IIS 5. Now it's down!. Currently returns "No web site is configured at this address.". Smile!

(note the solution to :) at the end of a bracketed sentence! Smile!)

Hmm, seems I did pretty w...

Hmm, seems I did pretty well in the Information Olympiad (however you spell it) and they want me to goto the final in Cambridge. I wonder how I'm getting there.

I found this in No Logo last night:

The most creative response came from students at the University of Toronto. A handful of undergraduates landed part-time jobs with the wash room billboard company and kept conveniently losing the custom-made screwdrivers that opened the 400 plastic frames. Pretty soon, a group calling themselves the Escher Appreciation Society were breaking into the "student-proof" frames and systematically replac