[WikiEN-l] SmartWikiSearch, a similarity search engine for Wikipedia

Jay Litwyn brewhaha at freenet.edmonton.ab.ca
Sun Aug 23 11:51:39 UTC 2009


<WJhonson at aol.com> wrote in message news:cfe.5d50bcc3.37c1a26d at aol.com...
> In a message dated 8/22/2009 10:56:20 AM Pacific Daylight Time,
> dgerard at gmail.com writes:
>
>
>> Because there is no need to determine what the meaning of
>> the particular term or keyword is, the pages it returns generally deal
>> with the same concept or concepts that you entered. For instance, if
>> you enter "Flower" and "Bee", it will find pages where these two
>> concepts overlap - those are pages about pollination.>>
> -------------------
>
> This seems big to me.
> It's creating, in a mindless way, semantic relationships between keywords.

The search for "bees" and "flowers" suggests "pollination". I do not see 
anything mindless about that. That is a human association. In another one, 
honey comes from sap in flowers, and gets flavour from them. So, the idea is 
to rank words-connecting-each higher than the AND-search alone, while the 
AND-search gets a higher rank than the OR-search. Works for me. You can get 
similar results on web pages if users do a good job of filling out 
descriptions, keywords, classification, and title tags. Pollination and 
honey should be at the top.

> This has been thought about for a long time it seems, but no one has 
> really
> solved the annoying issue of how to avoid most false positives.  I don't
> think you can avoid them all because English is so ambiguous but the use 
> of
> cross-links is a major leap forward.
>
> Very few people are going to link-up concepts that are basely minor, but
> scan all pages for the links highlights the semantic connetions between
> concepts.  You could even take it one step further, use the semantic web 
> to "point
> out" semantic connections that are not directly obvious.  Such as a leap
> from beekeeper to honeycomb.  Try to do that using Google.  You get 
> thousands
> of bad hits before you get the one good one.
>
> Search for "Hillbillies" and "Movie", using a semantic web you get the
> exact hit you want.
>
> W.J.
>
> _______________________________________________
> WikiEN-l mailing list
> WikiEN-l at lists.wikimedia.org
> To unsubscribe from this mailing list, visit:
> https://lists.wikimedia.org/mailman/listinfo/wikien-l
> 






More information about the WikiEN-l mailing list