[Foundation-l] excluding Wikipedia clones from searching

Mike Dupont jamesmikedupont at googlemail.com
Fri Dec 10 22:57:32 UTC 2010

On Fri, Dec 10, 2010 at 11:16 PM,  <WJhonson at aol.com> wrote:
> In a message dated 12/10/2010 2:12:44 PM Pacific Standard Time,
> jamesmikedupont at googlemail.com writes:
> Well, lets backtrack.
> The original question was, how can we exclude wikipedia clones from the
> search.
> my idea was to create a search engine that includes only refs from
> wikipedia in it.
> then the idea was to make our own engine instead of only using google.
> lets agree that we need first a list of references and we can talk
> about the details of the searching later.
> thanks,
> mike
> I search for "Mary Queen of Scots" and I want to exclude Wikipedia clones
> from my results, because I'm really only interested in... how many times she
> appears in various Wikipedia pages.  Why would I not just use the Wikipedia
> internal search engine then?

my idea was that you will want to search pages that are referenced by
wikipedia already, in my work on kosovo, it would be very helpful
because there are lots of bad results on google, and it would be nice
to use that also to see how many times certain names occur.
That is why we need also our own indexing engine, I would like to
count the occurances of each term and what page they occur on, and to
xref that to names on wikipedia against them. Wanted pages could also
be assisted like this, what are the most wanted pages that match
against the most common terms in the new refindex or also existing

These are the things that I would like to do.

James Michael DuPont
Member of Free Libre Open Source Software Kosova and Albania
flossk.org flossal.org

More information about the wikimedia-l mailing list