On 1/2/06, Robert Bamler <Robert.Bamler(a)gmx.de> wrote:
I'm working on a Wikipedia reader for the Apple iPod. To minimize the data
size occupied on the iPod I will only store the current articles on the
device (no "old"-table). However, the GFDL requires to mention the authors of
an article. Thus, each article on the iPod should contain a list of all
Wikipedia-users that appear in the revision history.
So, finally, my question is: Is there a way to get a list of the authors of
each article without having to download the (extremely large) xmldump of all
wikipedia-articles ever written since the beginning of time? I've already
tried the online SQL querying mechanism, but it seems to limit queries to
1000 results and to not support queries on the "old"-table.
AFAIK there is no easy way to do this on a per-article basis. You
could get the information for a copy of Wikipedia that's a few months
old at http://static.wikipedia.org/en/index.html
. Or you could write
a parser to scrape the action=history info. Or you could download the
entire old table dumps and then parse them. But none of these are
very easy or convenient.
Of course the GFDL allows aggregation of multiple GFDLed works into
one - if you want the list of all contributors to Wikipedia you can
get it at http://www.wikipedia.org/wikistats/csv/StatisticsUsers.csv
I've found this to be the only reasonable way to become compliant
without resorting to scraping.
By the way, another question comes to my mind: If a
contributor to an article
wasn't logged in, do I have to mention his IP-number in the list of authors?
I don't think this would make much sense, but perhaps it's legally required
by the GFDL. Or is it sufficient to make the reader application display the
string "There might be other (anonymous) contributors to this article"?
I'm looking forward to your replies,
Personally I take the position that someone who contributes to
Wikipedia without providing a name or pseudonym has waived her right
to attribution under the GFDL. That may or may not be a correct
interpretation of the law. But anonymous contributions are legally so
dubious anyway - a third party has basically no evidence that the
content was ever released under the GFDL in the first place. If
someone complains about an anonymous contribution you've basically got
to just meet their demands or remove the contribution, hope they don't
sue you for past damages, and hope that if they do sue you that you
can convince the judge to award some miniscule amount in damages.
Frankly, if someone complains about a non-anonymous contribution you
should probably do the same thing. Relying on a third party
click-through licensing agreement is not likely to hold up in court.