[WikiEN-l] SmartWikiSearch, a similarity search engine for Wikipedia

Gwern Branwen gwern0 at gmail.com
Sat Aug 22 18:05:18 UTC 2009


On Sat, Aug 22, 2009 at 1:55 PM, David Gerard<dgerard at gmail.com> wrote:
> http://www.smartwikisearch.com/
>
> http://www.smartwikisearch.com/about.html
>
> "Smart Wiki Search uses the link structure of Wikipedia to calculate
> which concepts each page is associated with. It is easy to see why
> looking at links can help group pages by concepts. For example, pages
> about mathematics have a lot of links to (and from) other pages about
> mathematics. Pages about the Apollo moon landing have a lot of links
> to pages about NASA and pages about the moon, etc.
>
> "More specifically, Smart Wiki Search uses the so-called
> eigendecomposition of the Wikipedia link transition matrix.
> Eigendecomposition provides of a number of special vectors, called
> eigenvectors, and their corresponding eigenvalues. These vectors are
> special because even a relatively small number of eigenvectors having
> the largest eigenvalues can capture all the most important properties
> of the link structure.
>
> "It is well-known that Google uses the eigenvector with the largest
> eigenvalue (the so-called primary eigenvector) to rank pages in their
> search results. Any other eigenvector cannot be used for ranking or
> scoring the pages, however they can still carry almost as much
> information as the primary eigenvector, and they can be very
> effectively used for grouping pages. Smart Wiki Search uses ~1,100
> eigenvectors with the largest eigenvalues. The primary eigenvector is
> discarded. More information about the algorithm can be found on the
> Algorithm page.
>
> "The algorithm only uses the link structure and page titles to perform
> the search. It does not use terms or keywords that it encounters on
> the page. Because there is no need to determine what the meaning of
> the particular term or keyword is, the pages it returns generally deal
> with the same concept or concepts that you entered. For instance, if
> you enter "Flower" and "Bee", it will find pages where these two
> concepts overlap - those are pages about pollination. Compare these
> results to a typical keyword search (Google, for instance: Flower,
> bee, site:en.wikipedia.org), and you will see just how much less
> focussed on the concepts the keyword search is."
>
>
> - d.

I tried this out the other day; it's a very cool idea, but by and
large, it seems that this hacker doesn't have enough CPU power to
extract the really good wikilinks, the ones that aren't already linked
inside the article. (eg. if I try it on [[Encyclopedia of the Brethren
of Purity]], I have to go all the way down to find a suggestion which
isn't already linked by the article.)

Perhaps in a decade we'll have enough computing power on the servers
that this could be a plugin - we'd then have auto-generated See Alsos,
which would be really cool.

-- 
gwern



More information about the WikiEN-l mailing list