On 02/07/12 18:45, Gregor Martynus wrote:
Hey there,
I really appreciate the help of you and the others in the mailing list, thanks to your help I was able to extract the data my friend needs for her wikipedia study.
While it worked as a proof of concept, that the data she needs is available, it took to much time to process all needed data and she needs it another 3 to 4 times.
I contact you because you offered me to send an SQL dump back then, does that mean you've access to the Wikipedia database?
I'm not an SQL expert, but I think based on the MediaWiki database schema, I could build the SQL queries for the 3 things she needs
- user_accounts(name, created_at, blocked_at) created manually in 2012
- pages (namespace_id, title, is_redirect)
- revisions (page_id, user_id, created_at) created by users from 1. bytes_diff would be nice, if available
The goal is to find users that manually created an account, have not been blocked and that made their first contribution in June 2012.
Any chance you could run the queries and send me the results as sql dumps? Or if not, do you know somebody I could ask?
Thanks again for your help!
-- Gregor Martynus
Given that your requisites here are more specific than those of the bug, I have made the following query for you, taking advantage that the recentchanges talbe keeps the data from last month: select rc_timestamp, rc_user_text, user_registration, rc_namespace, rc_title, rc_old_len, rc_new_len, rc_new, rc_minor, rc_type, page_is_redirect from recentchanges JOIN user on (user_id=rc_user) LEFT JOIN page ON (page_id=rc_cur_id) where rc_timestamp > '201206000000' and user_registration LIKE '2012%' order by rc_timestamp asc
Some notes: - This will contain from June 2 to July 2. - No bytes_diff, but you have old and new byte len :) - It may have some non-edit log entries.
File lives at http://toolserver.org/~platonides/sandbox/martynus.txt
I also made a list of the accounts created in 2012 which were made manually: SELECT user_id, user_name, log_timestamp AS signed_up_at FROM user LEFT JOIN logging ON user_id = log_user WHERE log_type = 'newusers' AND log_action = 'create' AND log_timestamp > '20120000000000
File lives at http://toolserver.org/~platonides/sandbox/martynus2.txt
And of block actions done to users registered in 2012: SELECT log_action, log_timestamp, log_title, user_registration FROM logging JOIN user on (log_title=user_name) WHERE log_type = 'block'AND log_timestamp > '20120000000000' and user_registration > '20120000000000'
File lives at http://toolserver.org/~platonides/sandbox/martynus3.txt
I guess that will work for your needs.
Regards
PS: Rows per file: 73426 martynus.txt 54429 martynus2.txt 2893 martynus3.txt