[Foundation-l] excluding Wikipedia clones from searching

Fri Dec 10 21:19:42 UTC 2010

In a message dated 12/10/2010 1:10:26 PM Pacific Standard Time, 
jamesmikedupont at googlemail.com writes:

> My point is we should index them ourselves. We should have the pages
> used as references first listed in an easy to use manner and if
> possible we should cache them. If they are not cacheable because of
> some restrictions, the references should be marked somehow as not as
> good and people might find better references. In the end, like
> citeseer you will find that pages that are available and open and
> cachable will be cited and used more than pages that are not.
> 
> Right now, I dont know of a simple way to even get this list of
> references from wp. There is alot of work to do, and if we do this, it
> will benefit the wikipedia. Another thing to do is to translate the
> pages referenced.
> 
> mike
> 

I understand your point, but you're avoiding answering the points I raised.
They are archived at archive.org by permission.  You tell archive.org to 
archive your site, and they do.  You tell them to stop, and they do.
What advantage would we have to repeat the caching yet again that 
archive.org is already doing?  You haven't answered that.

No matter what occurs, you're going to have trouble retrieving the list of 
refs from a WP page (or any web page), without knowing some programming 
language like PHP.  Using PHP it's a fairly trivial parsing request.  It's 
that's your only problem, I can write you a script to do it, for twenty bucks.

You cannot translate a work, which is under copyright protection, without 
violating their copyright.  Copyright extends to any effort that 
substantially mimics the underlying work.  A translation is found to violate copyright.  
You could however make a parody :)

W