On 1/2/06, Robert Bamler Robert.Bamler@gmx.de wrote:
Hi,
I'm working on a Wikipedia reader for the Apple iPod. To minimize the data size occupied on the iPod I will only store the current articles on the device (no "old"-table). However, the GFDL requires to mention the authors of an article. Thus, each article on the iPod should contain a list of all Wikipedia-users that appear in the revision history.
So, finally, my question is: Is there a way to get a list of the authors of each article without having to download the (extremely large) xmldump of all wikipedia-articles ever written since the beginning of time? I've already tried the online SQL querying mechanism, but it seems to limit queries to 1000 results and to not support queries on the "old"-table.
AFAIK there is no easy way to do this on a per-article basis. You could get the information for a copy of Wikipedia that's a few months old at http://static.wikipedia.org/en/index.html . Or you could write a parser to scrape the action=history info. Or you could download the entire old table dumps and then parse them. But none of these are very easy or convenient.
Of course the GFDL allows aggregation of multiple GFDLed works into one - if you want the list of all contributors to Wikipedia you can get it at http://www.wikipedia.org/wikistats/csv/StatisticsUsers.csv . I've found this to be the only reasonable way to become compliant without resorting to scraping.
By the way, another question comes to my mind: If a contributor to an article wasn't logged in, do I have to mention his IP-number in the list of authors? I don't think this would make much sense, but perhaps it's legally required by the GFDL. Or is it sufficient to make the reader application display the string "There might be other (anonymous) contributors to this article"?
I'm looking forward to your replies, Robert
Personally I take the position that someone who contributes to Wikipedia without providing a name or pseudonym has waived her right to attribution under the GFDL. That may or may not be a correct interpretation of the law. But anonymous contributions are legally so dubious anyway - a third party has basically no evidence that the content was ever released under the GFDL in the first place. If someone complains about an anonymous contribution you've basically got to just meet their demands or remove the contribution, hope they don't sue you for past damages, and hope that if they do sue you that you can convince the judge to award some miniscule amount in damages. Frankly, if someone complains about a non-anonymous contribution you should probably do the same thing. Relying on a third party click-through licensing agreement is not likely to hold up in court.
Anthony