Bloom filters (10 Jul 2004)
In reply to The Register: Archive.org suffers Fahrenheit 911 memory loss:
> But just hours after putting up the movie, Archive.org pulled it down Although Moore is the creator of the film, that doesn't mean that he holds the copyright. The copyright law is very broken. Archive.org knows this and is doing it's best to fix it. However, organisations are still bound by the rule of law  http://www.archive.org/about/dmca.php > "Then, it called Archive.org to remove any trace of the interview at all". Given that there's a six month delay till content hits the Wayback Machine, I very much doubt that. > "and how a "library" can obey this request defies comprehension" Welcome, once again, to the law. I'm sorry that archive.org doesn't do the Right Thing - irrespective of the law. We would all like several aspects of the law to be changed, but the way to do that is quite well known. Small organisations which break the law don't change it - they cease to exist. You know, if you want to host all the copyrighted content in the US, for free and take on the RIAA + MPAA etc. Go ahead and fund it. I'm guessing that you're not willing to take that personal risk. You'll just keep attacking others for not doing it for you. Archive.org isn't perfect - it's struggling to archive all the content that it legally can without the funds or the lawers to do so. But it's trying. Next time it's a slow news day - take a walk. AGL
There doesn't (for some strange reason) seem to be any good Python source for Bloom filters. There's a Sourceforge project, but that uses mpz for hashing, which is deprecated. So I've written PyBloom which impliments counting and standard bloom filters.