On Monday, 2. July 2012 at 23:47, Platonides wrote:
On 02/07/12 18:45, Gregor Martynus wrote:Hey there,I really appreciate the help of you and the others in the mailing list,thanks to your help I was able to extract the data my friend needs forher wikipedia study.While it worked as a proof of concept, that the data she needs isavailable, it took to much time to process all needed data and she needsit another 3 to 4 times.I contact you because you offered me to send an SQL dump back then, doesthat mean you've access to the Wikipedia database?I'm not an SQL expert, but I think based on the MediaWiki databaseschema, I could build the SQL queries for the 3 things she needs1. user_accounts(name, created_at, blocked_at) created manually in 20122. pages (namespace_id, title, is_redirect)3. revisions (page_id, user_id, created_at) created by users from 1.bytes_diff would be nice, if availableThe goal is to find users that manually created an account, have notbeen blocked and that made their first contribution in June 2012.Any chance you could run the queries and send me the results as sqldumps? Or if not, do you know somebody I could ask?Thanks again for your help!--Gregor MartynusGiven that your requisites here are more specific than those of the bug,I have made the following query for you, taking advantage that therecentchanges talbe keeps the data from last month:select rc_timestamp, rc_user_text, user_registration, rc_namespace,rc_title, rc_old_len, rc_new_len, rc_new, rc_minor, rc_type,page_is_redirect from recentchanges JOIN user on (user_id=rc_user) LEFTJOIN page ON (page_id=rc_cur_id) where rc_timestamp > '201206000000'and user_registration LIKE '2012%' order by rc_timestamp ascSome notes:- This will contain from June 2 to July 2.- No bytes_diff, but you have old and new byte len :)- It may have some non-edit log entries.File lives at http://toolserver.org/~platonides/sandbox/martynus.txtI also made a list of the accounts created in 2012 which were made manually:SELECT user_id, user_name, log_timestamp AS signed_up_at FROMuser LEFT JOIN logging ON user_id = log_user WHERE log_type ='newusers' AND log_action = 'create' AND log_timestamp >'20120000000000File lives at http://toolserver.org/~platonides/sandbox/martynus2.txtAnd of block actions done to users registered in 2012:SELECT log_action, log_timestamp, log_title, user_registration FROMlogging JOIN user on (log_title=user_name) WHERE log_type = 'block'ANDlog_timestamp > '20120000000000' and user_registration > '20120000000000'File lives at http://toolserver.org/~platonides/sandbox/martynus3.txtI guess that will work for your needs.RegardsPS: Rows per file:73426 martynus.txt54429 martynus2.txt2893 martynus3.txt