Hi there,
The coming CD version of the German wikipedia (see http://meta.wikimedia.org/wiki/Wikipedia_auf_CD) should include a list of authors for each article, as the GFDL requires. I am not that good in SQL - can anybody help?
For each article a statement like this should be produced:
"This article has been edited X times by Y wikipedia users and Z times by anonymous users: Foouser, Blauser, ...."
A table
All users should be listed - but not the IP-numbers of anonymous posts!
I thought of some SQL statements like (untested):
CREATE TABLE article_edit_count ( article VARCHAR(255) BINARY edited_by_wikipedians INT UNSIGNED, edited_with_IP INT UNSIGNED );
CREATE TABLE has_edited ( user VARCHAR(255) BINARY, article VARCHAR(255) BINARY );
INSERT INTO has_edited SELECT DISTINCT old_user_text AS user, old_title AS article FROM old WHERE old_namespace=0 UNION SELECT cur_user_text AS user, cur_title AS article
Well... but how to distingish anonymous and logged in users?
How can the authors be determined and how many hours/days does it take to run the SQL statements? By the way on my notebook a simple query for counting the number of links to each article, that I called 4 hours ago, is still running and running :-(
Thanks, Jakob