[Foundation-l] excluding Wikipedia clones from searching
WJhonson at aol.com
WJhonson at aol.com
Fri Dec 10 23:02:03 UTC 2010
In a message dated 12/10/2010 2:58:08 PM Pacific Standard Time,
jamesmikedupont at googlemail.com writes:
> my idea was that you will want to search pages that are referenced by
> wikipedia already, in my work on kosovo, it would be very helpful
> because there are lots of bad results on google, and it would be nice
> to use that also to see how many times certain names occur.
> That is why we need also our own indexing engine, I would like to
> count the occurances of each term and what page they occur on, and to
> xref that to names on wikipedia against them. Wanted pages could also
> be assisted like this, what are the most wanted pages that match
> against the most common terms in the new refindex or also existing
> pages.
>
Well then all you would need to do is cross-reference the refs themselves.
You don't need to cache the underlying pages to which they refer.
So in your new search engine, when you search for "Mary, Queen of Scots"
you really are saying, show me those external references, which are mentioned,
in connection with Mary Queen of Scots, by Wikipedia.
That doesn't require caching the pages to which refs refer. It only
requires indexing those refs which currently are used in-world.
W
More information about the foundation-l
mailing list