Sunday, October 17, 2004

The Misunderstood Lure of Page Rank

Google PageRank is one of the very classic credibility and authority indicators on the Web and also one of the most trusted and reliable ones.

Robin Good, from PACmeter - Popularity, Authority, Credibility Online: How To Measure Them

There are some very nice links in Robin Good's article, but he makes a mistake when he starts attributing credibility to page rank. I understand how people would possibly misunderstand, but I want to seriously caution anyone who thinks that they should use the measure of page rank on a Google toolbar as a measure of expertise or trustworthiness.

Page Rank, named after its inventor, Larry Page, is a method of calculating a value based upon the number of links to a URL, and the "popularity" of the pages linked from. It's a mathematical formula, and a patented method used to help the search engine determine which pages to return when someone performs a search for a site.

Here's how Google describes Page Rank:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

Rage Rank is one of around 100 or so different attributes that Google uses when it determines the order of listings that it returns when someone tries to find information using the search engines.

It is based upon the number of links to a site, but the intentions behind the links aren't considered. Those links could be purchased text advertisements. They could be from comment spam in blogs, and guest books, and wikis. They could be criticism, and their popularity is actually infamy.

Unfortunately, one of the failings of page rank is that the best page on a specific subject may exist on the web with few or no links to it, and because of that, the page won't come up on Google's search results for that subject.

A recent and excellent paper that looks at this problem with page rank is worth reading: Filthy Linking Rich And Getting Richer!, by Mike Grehan. The problem with page rank, according to Mike Grehan:

But this is also the creator of a very worrisome problem which affects new web pages with low linkage data, regardless of the quality of those pages. Quality and relevance are sometimes at odds with each other. And the ecology of the web may be suffering because of the way search engines are biased towards a page's popularity more than its quality. In short, "currently popular" pages are repeatedly being returned at the top of the results at the major search engines.

Page rank is a measure of popularity, and the amount of page rank that shows on the Google toolbar is an indication of that popularity. But it's not a measure of credibility, and never has been. And sites with lower page ranks may be of better quality, and of higher credibility than pages with higher page ranks.

To use page rank as a substitute for much better ways of determining credibility is a mistake. It's alluring to think that such a shortcut exists, but it doesn't.

Robin Good also points towards the Alexa toolbar as another measure of credibility. Again, it's a questionable assumption. The Alexa results are like an unscientific poll where the people being polled select themselves. The numbers Alexa uses are supposed to indicate the amount of traffic to web sites based upon the travels of people with Alexa toolbars installed on their browsers. I understand that a number of web sites in South Korea are pretty popular according to Alexa. I also understand that a lot of people in South Korea use the Alexa toolbar.

I think that my analogy of Alexa being like an internet poll where the participants aren't selected by the polling body is accurate. There's a nice set of questions, from the National Council on Public Polls, that you should ask and that a journalist should be aware of before trusting the credibility of a poll. These are 20 Questions A Journalist Should Ask About Poll Results. Here's a snippet from the analysis under the question that asks, "How were these people chosen":

The key reason that some polls reflect public opinion accurately and other polls are unscientific junk is how people were chosen to be interviewed. In scientific polls, the pollster uses a specific statistical method for picking respondents. In unscientific polls, the person picks himself to participate.

It's tempting to try to find a shortcut to determine the credibility of a web site. A toolbar is no substitute for an intelligent and informed decision.


Anonymous said...


your critique points are all well taken and appropiate.
I think you are completely right and that popularity does not equal credibility. At least in a significant number of cases.

I really appreciated you pointing this out so clearly and elegantly.

I shall revise and update my blog post entitled PACmeter accordingly.

Thank you, and keep up the good work!

Robin Good

William Slawski said...

Hi Robin,

It's good to see you here.

You know that gracious responses to criticism like yours are the types of things that win you friends and respect. Thank you.

It's wonderful to see the positive approach you've taken in your consideration of the points I made, and I look forward to seeing the updates to your post, and more from you.

Again, thanks.