Hi,
I'm working on a Wikipedia reader for the Apple iPod. To minimize the data size occupied on the iPod I will only store the current articles on the device (no "old"-table). However, the GFDL requires to mention the authors of an article. Thus, each article on the iPod should contain a list of all Wikipedia-users that appear in the revision history.
So, finally, my question is: Is there a way to get a list of the authors of each article without having to download the (extremely large) xmldump of all wikipedia-articles ever written since the beginning of time? I've already tried the online SQL querying mechanism, but it seems to limit queries to 1000 results and to not support queries on the "old"-table.
By the way, another question comes to my mind: If a contributor to an article wasn't logged in, do I have to mention his IP-number in the list of authors? I don't think this would make much sense, but perhaps it's legally required by the GFDL. Or is it sufficient to make the reader application display the string "There might be other (anonymous) contributors to this article"?
I'm looking forward to your replies, Robert
On 1/2/06, Robert Bamler Robert.Bamler@gmx.de wrote:
Hi,
I'm working on a Wikipedia reader for the Apple iPod. To minimize the data size occupied on the iPod I will only store the current articles on the device (no "old"-table). However, the GFDL requires to mention the authors of an article. Thus, each article on the iPod should contain a list of all Wikipedia-users that appear in the revision history.
So, finally, my question is: Is there a way to get a list of the authors of each article without having to download the (extremely large) xmldump of all wikipedia-articles ever written since the beginning of time? I've already tried the online SQL querying mechanism, but it seems to limit queries to 1000 results and to not support queries on the "old"-table.
AFAIK there is no easy way to do this on a per-article basis. You could get the information for a copy of Wikipedia that's a few months old at http://static.wikipedia.org/en/index.html . Or you could write a parser to scrape the action=history info. Or you could download the entire old table dumps and then parse them. But none of these are very easy or convenient.
Of course the GFDL allows aggregation of multiple GFDLed works into one - if you want the list of all contributors to Wikipedia you can get it at http://www.wikipedia.org/wikistats/csv/StatisticsUsers.csv . I've found this to be the only reasonable way to become compliant without resorting to scraping.
By the way, another question comes to my mind: If a contributor to an article wasn't logged in, do I have to mention his IP-number in the list of authors? I don't think this would make much sense, but perhaps it's legally required by the GFDL. Or is it sufficient to make the reader application display the string "There might be other (anonymous) contributors to this article"?
I'm looking forward to your replies, Robert
Personally I take the position that someone who contributes to Wikipedia without providing a name or pseudonym has waived her right to attribution under the GFDL. That may or may not be a correct interpretation of the law. But anonymous contributions are legally so dubious anyway - a third party has basically no evidence that the content was ever released under the GFDL in the first place. If someone complains about an anonymous contribution you've basically got to just meet their demands or remove the contribution, hope they don't sue you for past damages, and hope that if they do sue you that you can convince the judge to award some miniscule amount in damages. Frankly, if someone complains about a non-anonymous contribution you should probably do the same thing. Relying on a third party click-through licensing agreement is not likely to hold up in court.
Anthony
Anthony DiPierro wrote:
On 1/2/06, Robert Bamler Robert.Bamler@gmx.de wrote:
By the way, another question comes to my mind: If a contributor to an article wasn't logged in, do I have to mention his IP-number in the list of authors? I don't think this would make much sense, but perhaps it's legally required by the GFDL. Or is it sufficient to make the reader application display the string "There might be other (anonymous) contributors to this article"?
Personally I take the position that someone who contributes to Wikipedia without providing a name or pseudonym has waived her right to attribution under the GFDL. That may or may not be a correct interpretation of the law. But anonymous contributions are legally so dubious anyway - a third party has basically no evidence that the content was ever released under the GFDL in the first place. If someone complains about an anonymous contribution you've basically got to just meet their demands or remove the contribution, hope they don't sue you for past damages, and hope that if they do sue you that you can convince the judge to award some miniscule amount in damages. Frankly, if someone complains about a non-anonymous contribution you should probably do the same thing. Relying on a third party click-through licensing agreement is not likely to hold up in court.
That seems like a reasonable analysis, but not to the point that I would be so alarmist about being taken to court. With an identified and active contributor we at least have the opportunity to seek out his opinion on the situation. If someone complains of a copyvio that was submitted anonymously we have to give that complainer the benefit of the doubt as long as he can identify the rights that were violated. The threat of an immediate lawsuit should not need to be the primary concern; ethics and integrity should be. When things get that far it becomes clear that one or both of the parties has decided to be a jerk. Suing in such situations is not an easy process. As a condition of joining the rest of the world of copyright law the United States had to accept that registration was not a pre-condition to having work validly copyright; nevertheless, one can only collect for damages that took place after registration.
As long as the problem lies with the original Wikipedia article it is much easier to deal with we can go to the history, and see who contributed what. This is not available downstream. A mere list of contributors provides no opportunity to isolate the work of any single contributor. Lately, when things have been transwikied for completely valid reasons to Wiktionary a contribution history list is also put on the talk page, but that history is completely useless because it may have related to an edit war over points that are of absolutely no interest to Wiktionary. It would be much nicer to drop all the irrelevancies from that list.
Ec
Robert Bamler wrote:
I'm working on a Wikipedia reader for the Apple iPod.
Out of curiosity, is this to run under the regular iPod OS or an alternative like iPod Linux? If it'll run on a stock iPod, let me know if you need a beta tester for the fifth-generation models.
To minimize the data size occupied on the iPod I will only store the current articles on the device (no "old"-table). However, the GFDL requires to mention the authors of an article. Thus, each article on the iPod should contain a list of all Wikipedia-users that appear in the revision history.
So, finally, my question is: Is there a way to get a list of the authors of each article without having to download the (extremely large) xmldump of all wikipedia-articles ever written since the beginning of time? I've already tried the online SQL querying mechanism, but it seems to limit queries to 1000 results and to not support queries on the "old"-table.
Currently no, sorry. :(
You could screen-scrape the history pages, but that's ugly. ;)
By the way, another question comes to my mind: If a contributor to an article wasn't logged in, do I have to mention his IP-number in the list of authors? I don't think this would make much sense, but perhaps it's legally required by the GFDL. Or is it sufficient to make the reader application display the string "There might be other (anonymous) contributors to this article"?
The Wiki Press books being published in Germany don't appear to be including IP addresses in their author lists, but YMMV, IANAL, consult your lawyer if in doubt, blah blah.
-- brion vibber (brion @ pobox.com)
wikipedia-l@lists.wikimedia.org