Hi there,
The coming CD version of the German wikipedia (see http://meta.wikimedia.org/wiki/Wikipedia_auf_CD) should include a list of authors for each article, as the GFDL requires. I am not that good in SQL - can anybody help?
For each article a statement like this should be produced:
"This article has been edited X times by Y wikipedia users and Z times by anonymous users: Foouser, Blauser, ...."
A table
All users should be listed - but not the IP-numbers of anonymous posts!
I thought of some SQL statements like (untested):
CREATE TABLE article_edit_count ( article VARCHAR(255) BINARY edited_by_wikipedians INT UNSIGNED, edited_with_IP INT UNSIGNED );
CREATE TABLE has_edited ( user VARCHAR(255) BINARY, article VARCHAR(255) BINARY );
INSERT INTO has_edited SELECT DISTINCT old_user_text AS user, old_title AS article FROM old WHERE old_namespace=0 UNION SELECT cur_user_text AS user, cur_title AS article
Well... but how to distingish anonymous and logged in users?
How can the authors be determined and how many hours/days does it take to run the SQL statements? By the way on my notebook a simple query for counting the number of links to each article, that I called 4 hours ago, is still running and running :-(
Thanks, Jakob
Hi,
seems like nobody is interested in providing the users of a offline CD version with a list of authors for each article like required by GFDL?
I wrote
All users should be listed - but not the IP-numbers of anonymous posts! ... INSERT INTO has_edited SELECT DISTINCT old_user_text AS user, old_title AS article FROM old WHERE old_namespace=0
INSERT INTO has_edited SELECT cur_user_text AS user, cur_title AS article FROM cur WHERE cur_namespace=0
SELECT DISTINCT user FROM has_edited WHERE article='ArticleYouWantTheAuthorListFor'
Should work, but IPs need to be filtered out. I still do not know how to count the number of edits by anonymous users.
By the way getting all the author of an article should be a feature in MediaWiki too.
Greetings Jakob
P.S: How long does it take to set up a copy of the whole german (all tables, also users!) for an endreview process? Yes it will be a fork but only a short one to be able to delete all URVs, images without copyright statement, portal pages and things like that.
P.P.S: If the SQL-Statement stuff for detecting the authors does not work we could use a bot that requests the history, creates the list of authors and puts it at the discussion page - wie do not want to include the discussion pages at the CD in any case.
On Fri, Aug 27, 2004 at 09:50:01AM +0200, Jakob Voss wrote:
Hi,
seems like nobody is interested in providing the users of a offline CD version with a list of authors for each article like required by GFDL?
I wrote
All users should be listed - but not the IP-numbers of anonymous posts! ... INSERT INTO has_edited SELECT DISTINCT old_user_text AS user, old_title AS article FROM old WHERE old_namespace=0
INSERT INTO has_edited SELECT cur_user_text AS user, cur_title AS article FROM cur WHERE cur_namespace=0
SELECT DISTINCT user FROM has_edited WHERE article='ArticleYouWantTheAuthorListFor'
Should work, but IPs need to be filtered out. I still do not know how to count the number of edits by anonymous users.
Why do you need an extra table?
SELECT DISTINCT old_user_text FROM old WHERE old_namespace=0 AND old_title='ArticleYouWantTheAuthorListFor' AND old_user_id != 0;
should do the job. (IP's have old_user_id=0)
Regards,
jens
Jens Frank wrote:
Why do you need an extra table?
SELECT DISTINCT old_user_text FROM old WHERE old_namespace=0 AND old_title='ArticleYouWantTheAuthorListFor' AND old_user_id != 0;
should do the job. (IP's have old_user_id=0)
Thanks! I missed the old_user_id colum.
I'll write the required function in PHP to get it implemented as a function in MediaWiki.
Jakob
wikitech-l@lists.wikimedia.org