Re: [WikiEN-l] SmartWikiSearch, a similarity search engine for Wikipedia

22 Aug 2009

I have a feeling a lot of those are duplications of templates placed on a
page - Macbeth linking to Romeo and Juliet (and vice versa) was my first
example.  Multiple search terms would seem to be the real place this would
be useful, to minimize crossover from templates.

~A

On Sat, Aug 22, 2009 at 14:25, Brian &lt;Brian.Mingus(a)colorado.edu&gt; wrote:

...
  On Sat, Aug 22, 2009 at 12:24 PM, Brian
&lt;Brian.Mingus(a)colorado.edu&gt; wrote:

  On Sat, Aug 22, 2009 at 12:05 PM, Gwern Branwen
&lt;gwern0(a)gmail.com&gt;  wrote:

 I tried this out the other day; it's a very cool idea, but by and
 large, it seems that this hacker doesn't have enough CPU power to
 extract the really good wikilinks, the ones that aren't already linked
 inside the article. (eg. if I try it on [[Encyclopedia of the Brethren
 of Purity]], I have to go all the way down to find a suggestion which
 isn't already linked by the article.)

 Perhaps in a decade we'll have enough computing power on the servers
 that this could be a plugin - we'd then have auto-generated See Alsos,
 which would be really cool.

 --
 gwern

 A fancy technique called Latent Dirichlet Allocation can be used to find
 links that aren't already linked in the document themselves. I did this  for
  a class project. Here is an expert from the paper
which also shows you  the
  latent connections it found for the Simple
article on hippies.

 http://upload.wikimedia.org/wikipedia/meta/2/25/LDA-Wiki-Search.png

 I note that Google has released parallel lda so its not feasible to run  it
  on all of wikipedia using an ordinary Beowulf
cluster.
 http://code.google.com/p/plda/

 * now feasible
 _______________________________________________
 WikiEN-l mailing list
 WikiEN-l(a)lists.wikimedia.org
 To unsubscribe from this mailing list, visit:
 https://lists.wikimedia.org/mailman/listinfo/wikien-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [WikiEN-l] SmartWikiSearch, a similarity search engine for Wikipedia