Well, lets backtrack. The original question was, how can we exclude wikipedia clones from the search. my idea was to create a search engine that includes only refs from wikipedia in it. then the idea was to make our own engine instead of only using google. lets agree that we need first a list of references and we can talk about the details of the searching later. thanks, mike
On Fri, Dec 10, 2010 at 11:02 PM, WJhonson@aol.com wrote:
In a message dated 12/10/2010 1:31:20 PM Pacific Standard Time, jamesmikedupont@googlemail.com writes:
If we prefer pages that can be cached and translated, and mark the others that cannot, then by natural selection we will in long term replaces the pages that are not allowed to be cached with ones that can be.
My suggestion is for a wikipedia project, something to be supported and run on the toolserver or similar.
I think if you were to propose that we should "prefer" pages that "can be cached and translated" you'd get a firestorm of opposition. The majority of our refs, imho, are still under copyright. This is because the majority of our refs are either web pages created by various authors who do not specify a free license (and therefore under U.S. law automatically enjoy copyright protection). Or they are refs to works which are relatively current, and are cited, for example in Google Books Preview mode, or at Amazon look-inside pages.
I still cannot see any reason why we would want to cache anything like this. You haven't addressed what benefit it gives us, to cache refs. My last question here is not about whether we can or how, but how does it help the project?
How?
W