I have a feeling a lot of those are duplications of templates placed on a
page - Macbeth linking to Romeo and Juliet (and vice versa) was my first
example. Multiple search terms would seem to be the real place this would
be useful, to minimize crossover from templates.
~A
On Sat, Aug 22, 2009 at 14:25, Brian <Brian.Mingus(a)colorado.edu> wrote:
On Sat, Aug 22, 2009 at 12:24 PM, Brian
<Brian.Mingus(a)colorado.edu> wrote:
On Sat, Aug 22, 2009 at 12:05 PM, Gwern Branwen
<gwern0(a)gmail.com>
wrote:
I tried this out the other day; it's a very cool idea, but by and
large, it seems that this hacker doesn't have enough CPU power to
extract the really good wikilinks, the ones that aren't already linked
inside the article. (eg. if I try it on [[Encyclopedia of the Brethren
of Purity]], I have to go all the way down to find a suggestion which
isn't already linked by the article.)
Perhaps in a decade we'll have enough computing power on the servers
that this could be a plugin - we'd then have auto-generated See Alsos,
which would be really cool.
--
gwern
A fancy technique called Latent Dirichlet Allocation can be used to find
links that aren't already linked in the document themselves. I did this
for
a class project. Here is an expert from the paper
which also shows you
the
latent connections it found for the Simple
article on hippies.
http://upload.wikimedia.org/wikipedia/meta/2/25/LDA-Wiki-Search.png
I note that Google has released parallel lda so its not feasible to run
it
on all of wikipedia using an ordinary Beowulf
cluster.
http://code.google.com/p/plda/
* now feasible
_______________________________________________
WikiEN-l mailing list
WikiEN-l(a)lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l