[WikiEN-l] SmartWikiSearch, a similarity search engine for Wikipedia

Amory Meltzer amorymeltzer at gmail.com
Sat Aug 22 18:34:45 UTC 2009


I have a feeling a lot of those are duplications of templates placed on a
page - Macbeth linking to Romeo and Juliet (and vice versa) was my first
example.  Multiple search terms would seem to be the real place this would
be useful, to minimize crossover from templates.

~A


On Sat, Aug 22, 2009 at 14:25, Brian <Brian.Mingus at colorado.edu> wrote:

> On Sat, Aug 22, 2009 at 12:24 PM, Brian <Brian.Mingus at colorado.edu> wrote:
>
> > On Sat, Aug 22, 2009 at 12:05 PM, Gwern Branwen <gwern0 at gmail.com>
> wrote:
> >
> >>
> >> I tried this out the other day; it's a very cool idea, but by and
> >> large, it seems that this hacker doesn't have enough CPU power to
> >> extract the really good wikilinks, the ones that aren't already linked
> >> inside the article. (eg. if I try it on [[Encyclopedia of the Brethren
> >> of Purity]], I have to go all the way down to find a suggestion which
> >> isn't already linked by the article.)
> >>
> >> Perhaps in a decade we'll have enough computing power on the servers
> >> that this could be a plugin - we'd then have auto-generated See Alsos,
> >> which would be really cool.
> >>
> >> --
> >> gwern
> >>
> >
> > A fancy technique called Latent Dirichlet Allocation can be used to find
> > links that aren't already linked in the document themselves. I did this
> for
> > a class project. Here is an expert from the paper which also shows you
> the
> > latent connections it found for the Simple article on hippies.
> >
> > http://upload.wikimedia.org/wikipedia/meta/2/25/LDA-Wiki-Search.png
> >
> > I note that Google has released parallel lda so its not feasible to run
> it
> > on all of wikipedia using an ordinary Beowulf cluster.
> > http://code.google.com/p/plda/
> >
>
> * now feasible
> _______________________________________________
> WikiEN-l mailing list
> WikiEN-l at lists.wikimedia.org
> To unsubscribe from this mailing list, visit:
> https://lists.wikimedia.org/mailman/listinfo/wikien-l
>


More information about the WikiEN-l mailing list