In a message dated 12/10/2010 2:12:44 PM Pacific Standard Time, jamesmikedupont@googlemail.com writes:
Well, lets backtrack. The original question was, how can we exclude wikipedia clones from the search. my idea was to create a search engine that includes only refs from wikipedia in it. then the idea was to make our own engine instead of only using google. lets agree that we need first a list of references and we can talk about the details of the searching later. thanks, mike
I search for "Mary Queen of Scots" and I want to exclude Wikipedia clones from my results, because I'm really only interested in... how many times she appears in various Wikipedia pages. Why would I not just use the Wikipedia internal search engine then?
On Fri, Dec 10, 2010 at 11:16 PM, WJhonson@aol.com wrote:
In a message dated 12/10/2010 2:12:44 PM Pacific Standard Time, jamesmikedupont@googlemail.com writes:
Well, lets backtrack. The original question was, how can we exclude wikipedia clones from the search. my idea was to create a search engine that includes only refs from wikipedia in it. then the idea was to make our own engine instead of only using google. lets agree that we need first a list of references and we can talk about the details of the searching later. thanks, mike
I search for "Mary Queen of Scots" and I want to exclude Wikipedia clones from my results, because I'm really only interested in... how many times she appears in various Wikipedia pages. Why would I not just use the Wikipedia internal search engine then?
my idea was that you will want to search pages that are referenced by wikipedia already, in my work on kosovo, it would be very helpful because there are lots of bad results on google, and it would be nice to use that also to see how many times certain names occur. That is why we need also our own indexing engine, I would like to count the occurances of each term and what page they occur on, and to xref that to names on wikipedia against them. Wanted pages could also be assisted like this, what are the most wanted pages that match against the most common terms in the new refindex or also existing pages.
These are the things that I would like to do.
wikimedia-l@lists.wikimedia.org