Collective Information

JMC2 Group 10
Adam Langley (agl02)
Anton Pietek (map02)
John McCrae (jpm202)

A JMC2 Suprema Project, 2004.

Reputation and Trust Systems

Reputation systems seek to distill the knowledge of a community about members of that community's trustworthiness. Most well known examples involve online commerce, but similar ideas have been used with good results to less obvious areas such as the reliability of computers in network.

eBay screenshot

eBay runs the most well known reputation system which allows users to comment and rate people which they do business with. People are more likely to feel at ease when dealing with members with a good reputation. However, the eBay system is not without problems.

It's perfectly possible (and indeed, it has been done) to create a series of accounts on eBay and then fix auction s between them. These accounts (all under the control of a single person) then give each other fantastic feedback and the reputation of all the accounts rises accordingly. Once the reputation of the accounts have reached a suitable level it is then used to commit fraud against another member.

This (the Sybil attack) is a fundamental weakness of the eBay reputation system and, at best, eBay can only deploy incomplete, heuristic solutions to it.

Better designed systems are resistant to this attack.

Advogato

Advogato is a web site designed to test out ideas in trust metrics [LEVIEN00]. Users of it can rate other users at three different levels and Advogato seeks to minimise the effect of any attack on legitimate users. (Where an attack would be an attempt to get an account with a good reputation outside the normal means).

A simple graph Attacking nodes are highly clustered

The core idea behind nearly all strong trust metrics is to treat users as nodes on a graph and to represent one user rating another as a directed edge in this graph (see left).

It's easy to see that this gives us a handle on stopping the Sybil attack outlined above if we consider what the graph would look like (see right).

The attacking users may have a high rating, but they form a small cluster without strong links to the rest of the graph.

Advogato is a centralised system so in order for a node to be trusted it has to have good links to a seed node. The seed nodes represent trusted users and, in the case of Advogato, there are four of them. A capacity is assigned to each node depending on the shortest distance (measured in hops) to one of the seed nodes. The capacity falls off exponentially with distance.

The per-node capacities are then converted into per-edge capacities and a number of other modifications to the graph are made to make it compatible with common maximal-flow algorithms. Trust then flows out of the seed nodes down the edges of the graph and the amount that reaches a bad node can be shown to be strictly limited. Thus the attack outlined above doesn't work.

PageRank

PageRank is a system used by Google as an input to their search result rankings. It, like Advogato is designed to be an attack resistant trust metric. The this case the nodes in the graph are webpages and edges are links between web pages.

There are a number of formulations of the PageRank algorithm. The most intuitive is to imagine a random walk which starts on a random page on the Internet and follows a random link from that page to another and so on. At each step there is a small probability that the walk will end. The PageRank for a given page is then the probability that a walk will end on that page.

As the Google creators describe it [GOOGLE97]:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

However in fact the Google system is anything but "democratic", and in fact is better referred to as a plutocracy, a rule by the rich. In the PageRank algorithm all pages get a vote, but the value of each vote is decided by the PageRank of the voter.

All pages are given a basic value vote, q, which is set at 0.15 by Google. Then the page rank of each page is decided by:

Google PageRank equation

This can clearly be represented as a matrix and then solved as a Markov chain (details...). It can be proven that the PageRank value given is linearly related to the probability of our random surfer ending up on any particular page. To see page rank in action visit Prog, an adaption of Google that also displays the page rank value of each page.

Initially this gave excellent results and launched Google as the best search engine. However, a number of problems with it caused Google to reduce its weighting.

Primarily, it is vulnerable to a Sybil attack. Trust doesn't flow out of a single (or small) set of nodes, but is distributed across all of them. Thus it's possible to create a large number of websites which all link to a given page with the same phrase. This causes the PageRank of the target page to rise and certain phrases can be Google-bombed to the top of the results when searching for a certain phrase. For example, searching for miserable failure currently returns the Biography of President George W. Bush as the top result. And searching for unprofessional cowboys returns an article about how a certain company failed to meet its contract with Imperial College Union in second place.