On Sat, Aug 22, 2009 at 12:05 PM, Gwern Branwen <gwern0(a)gmail.com> wrote:
I tried this out the other day; it's a very cool idea, but by and
large, it seems that this hacker doesn't have enough CPU power to
extract the really good wikilinks, the ones that aren't already linked
inside the article. (eg. if I try it on [[Encyclopedia of the Brethren
of Purity]], I have to go all the way down to find a suggestion which
isn't already linked by the article.)
Perhaps in a decade we'll have enough computing power on the servers
that this could be a plugin - we'd then have auto-generated See Alsos,
which would be really cool.
--
gwern
A fancy technique called Latent Dirichlet Allocation can be used to find
links that aren't already linked in the document themselves. I did this for
a class project. Here is an expert from the paper which also shows you the
latent connections it found for the Simple article on hippies.
http://upload.wikimedia.org/wikipedia/meta/2/25/LDA-Wiki-Search.png
I note that Google has released parallel lda so its not feasible to run it
on all of wikipedia using an ordinary Beowulf cluster.
http://code.google.com/p/plda/