Hi there,
The coming CD version of the German wikipedia
(see
http://meta.wikimedia.org/wiki/Wikipedia_auf_CD)
should include a list of authors for each article, as the GFDL requires.
I am not that good in SQL - can anybody help?
For each article a statement like this should be produced:
"This article has been edited X times by Y wikipedia users and Z times
by anonymous users: Foouser, Blauser, ...."
A table
All users should be listed - but not the IP-numbers of anonymous posts!
I thought of some SQL statements like (untested):
CREATE TABLE article_edit_count (
article VARCHAR(255) BINARY
edited_by_wikipedians INT UNSIGNED,
edited_with_IP INT UNSIGNED
);
CREATE TABLE has_edited (
user VARCHAR(255) BINARY,
article VARCHAR(255) BINARY
);
INSERT INTO has_edited
SELECT DISTINCT old_user_text AS user, old_title AS article
FROM old WHERE old_namespace=0
UNION
SELECT cur_user_text AS user, cur_title AS article
Well... but how to distingish anonymous and logged in users?
How can the authors be determined and how many hours/days does it take
to run the SQL statements? By the way on my notebook a simple query for
counting the number of links to each article, that I called 4 hours ago,
is still running and running :-(
Thanks,
Jakob