In a message dated 12/10/2010 2:58:08 PM Pacific Standard Time,
jamesmikedupont(a)googlemail.com writes:
my idea was that you will want to search pages that
are referenced by
wikipedia already, in my work on kosovo, it would be very helpful
because there are lots of bad results on google, and it would be nice
to use that also to see how many times certain names occur.
That is why we need also our own indexing engine, I would like to
count the occurances of each term and what page they occur on, and to
xref that to names on wikipedia against them. Wanted pages could also
be assisted like this, what are the most wanted pages that match
against the most common terms in the new refindex or also existing
pages.
Well then all you would need to do is cross-reference the refs themselves.
You don't need to cache the underlying pages to which they refer.
So in your new search engine, when you search for "Mary, Queen of Scots"
you really are saying, show me those external references, which are mentioned,
in connection with Mary Queen of Scots, by Wikipedia.
That doesn't require caching the pages to which refs refer. It only
requires indexing those refs which currently are used in-world.
W