[WikiEN-l] SmartWikiSearch, a similarity search engine for Wikipedia

David Gerard dgerard at gmail.com
Sat Aug 22 17:55:49 UTC 2009


http://www.smartwikisearch.com/

http://www.smartwikisearch.com/about.html

"Smart Wiki Search uses the link structure of Wikipedia to calculate
which concepts each page is associated with. It is easy to see why
looking at links can help group pages by concepts. For example, pages
about mathematics have a lot of links to (and from) other pages about
mathematics. Pages about the Apollo moon landing have a lot of links
to pages about NASA and pages about the moon, etc.

"More specifically, Smart Wiki Search uses the so-called
eigendecomposition of the Wikipedia link transition matrix.
Eigendecomposition provides of a number of special vectors, called
eigenvectors, and their corresponding eigenvalues. These vectors are
special because even a relatively small number of eigenvectors having
the largest eigenvalues can capture all the most important properties
of the link structure.

"It is well-known that Google uses the eigenvector with the largest
eigenvalue (the so-called primary eigenvector) to rank pages in their
search results. Any other eigenvector cannot be used for ranking or
scoring the pages, however they can still carry almost as much
information as the primary eigenvector, and they can be very
effectively used for grouping pages. Smart Wiki Search uses ~1,100
eigenvectors with the largest eigenvalues. The primary eigenvector is
discarded. More information about the algorithm can be found on the
Algorithm page.

"The algorithm only uses the link structure and page titles to perform
the search. It does not use terms or keywords that it encounters on
the page. Because there is no need to determine what the meaning of
the particular term or keyword is, the pages it returns generally deal
with the same concept or concepts that you entered. For instance, if
you enter "Flower" and "Bee", it will find pages where these two
concepts overlap - those are pages about pollination. Compare these
results to a typical keyword search (Google, for instance: Flower,
bee, site:en.wikipedia.org), and you will see just how much less
focussed on the concepts the keyword search is."


- d.



More information about the WikiEN-l mailing list