Searching Gnutella networks (04 Jul 2002)

Bram flames Ted Nelson

Emacs mode for M$ Word

Will Knight pointed me to this paper on searching Gnutella networks. Some commentry:

The motivation for this metric is that in P2P systems, the most notable overhead tends to be the processing loadthat the network imposes on each participant.

The processing load is their most notable overhead? They must have one hell of a bandwidth or be really bad coders.

If a PC has to handle many network interrupts when it joins the P2P network, the user will be forced to take the PC off the P2P network to get "real" work done.

Because of interrupt load? I'm wondering if this paper has been translated from another language. Once again, the limiting factor is bandwidth and certainly not interrupts.

Other than that odd misunderstanding the rest of the paper is very good. Some points:

  • Walkers don't find uncommon documents

Unstructured networks fail to find uncommon documents generally, but walkers are very bad at it. Consider a million node network where a document only exists on one node. With 64 walkers, state keeping and an average of 1 second for a hop, you are looking at over 4 hours search time.

Not to mention that the suggested `talk back' limit of once per 4 hops would generate 16 packets a second to the searcher; enough to take up a notable chunk of a modem's bandwidth.

  • Reason why random placements are better

The paper also suggests random replication without considering why. I would hazard a guess of the following:

Random walkers are going to tend to end up at the well connected nodes in a power-law network and hang around there. Thus path replication will hit the high order nodes which random replication (as they define it) will tend to hitmore low order nodes. I would suspect that measuring the number of copies in the network would show that random replication gives the highest number.

  • Random networks don't work

The paper also suggests that networks should be random. This is very nice but not all nodes are created equal and those on a T1 line can handle more messages per second than modem users. This bandwidth inequality (and the distribution of bandwidth) will force a power-law network to some extent.

  • It's not anonymous

Well documented, but I'm just pointing it out.