[WikiEN-l] Nofollow back on URL links on en.wikipedia.org articles for now

Steve Summit scs at eskimo.com
Sat Jan 20 15:02:28 UTC 2007


Nina wrote:
> Hi. What did you do? Pretend I'm, like, eight and/or stupid (because on
> this I are pretty stoopid).

Once upon a time, websites were written and maintained by
individuals, or by relatively small, closely-knit groups of people.

Once upon a time, and indeed even to this day, there has always
been a need to try to figure out how "good" a website is.  Now,
of course, "goodness" is a terribly subjective and multifaceted
concept, so trying to reduce it to a unidimensional metric or
"rank" is a task fraught with peril and ultimately utterly
impossible, but the need is strong enough that people are bound
to try anyway.  One area in which the need is real is: ranking
search results.  It's (comparatively) very easy to write a web
search engine that returns links to every single page where a
user's search terms are mentioned.  It's much, much more
difficult, however, to rig it up so that the user can easily zero
in on the *interesting* or *useful* links first, without having
to wade through all of the hundreds or thousands or millions of
hits which a simpleminded brute-force search engine might yield.

Once upon a time, Larry Page and Sergey Brin had a great idea.
They were trying to get a handle on "goodness" as defined by the
*users* of a page, *not* on the goodness that the authors of a
page wished it had, or might try to assert that it had.  Even
more to the point: Larry and Sergey wanted to rank "goodness"
in terms useful to the people doing the searching, not in terms
useful to the owners of the websites where the hits might (or
might not) be found.  L&S realized that one way to get a handle
on this user-perceived goodness was to look at how many people
linked to a given page.  Simply speaking, to first order, the
more people link to webpage X, the "better" webpage X is, and
the higher webpage X should appear in a list of search results.

As an additional, second-order wrinkle, the founders of
Google realized that not all links are created equal.  Among other
things, links to an unknown site *from* sites that are known to
be "good" count more towards ranking site X's "goodness" than do
links from other, random, unknown or not-so-good sites.

Needless to say, this strategy turned out to work very, very
well.  Google is now very, very successful, and its name has
become literally synonymous with "to do a web search for".

However, fast-forward to today.  There are now "social" websites
which are most assuredly *not* written and maintained by
"individuals or relatively small, closely-knit groups of
people".  Social websites, such as blogs and wikis, are by
definition written and maintained by anybody and everybody out
there on the whole world-wide internet.  And that's a fine,
wonderful, libertarian and egalitarian thing -- except that it
collides head-on with Google's strategy.  The collision wouldn't
matter so much if there weren't social websites with high
pagerank, or if Google and its pagerank algorithm weren't so
successful.  But in a world where Google is far and away the #1
search engine, and where the ever-so-social (or at least
ever-so-wiki) site known as Wikipedia is a top-10 website with
stupefyingly high pagerank, we have the makings of quite a fine
little quandary.

Simply put, Wikipedia is an absolutely irresistible,
boron-neodymium supermagnet for linkspam.  The World Wide Web is
no longer Tim Berners-Lee's theoretically interesting research
lab thingy, it's an unignorable real-world phenomenon.  If you're
a commercial website operator, having high Google pagerank is
money in the bank.  So if you yourself can go in and create links
from a high-pagerank site like Wikipedia to your grotty little
commercial site, well, you'd be a fool not to.

So Google, like virtually all wildly successful and unignorable
real-world phenomena, has had to compromise a bit on its
principles.  Links from high-rank sites contribute higher to a
linked-to site's pagerank, *unless* the high-rank site is openly
editable by anybody, in which case the links probably have to
be ignored.  So Google invented the new HTML link attribute
"nofollow".  (One of the nice things about open, extensible
languages like HTML is that anybody can invent new extensions
like this to it anytime.)  Nofollow means, "don't weight this
link by my site's pagerank when totting up the linked-to
website's pagerank", or indeed, "don't follow this link in order
to tot up pagerank at all".  Google invented this attribute
specifically for high-pagerank social sites such as Wikipedia,
and encourages us to use it on our user-editable external links.


In terms of "what did specifically Brion do to turn on the
"nofollow" attribute for Wikipedia external links?", that
I can't answer.  Probably some propellorhead computer weenie
thing involving "property lists" or "php configuration variables"
or suchlike. :-)




More information about the WikiEN-l mailing list