excellent, thanks a lot!! Really appreciate your help!
On Monday, 2. July 2012 at 23:47, Platonides wrote:
On 02/07/12 18:45, Gregor Martynus wrote:
I really appreciate the help of you and the others in the mailing list,
thanks to your help I was able to extract the data my friend needs for
her wikipedia study.
While it worked as a proof of concept, that the data she needs is
available, it took to much time to process all needed data and she needs
it another 3 to 4 times.
I contact you because you offered me to send an SQL dump back then, does
that mean you've access to the Wikipedia database?
I'm not an SQL expert, but I think based on the MediaWiki database
schema, I could build the SQL queries for the 3 things she needs
1. user_accounts(name, created_at, blocked_at) created manually in 2012
2. pages (namespace_id, title, is_redirect)
3. revisions (page_id, user_id, created_at) created by users from 1.
bytes_diff would be nice, if available
The goal is to find users that manually created an account, have not
been blocked and that made their first contribution in June 2012.
Any chance you could run the queries and send me the results as sql
dumps? Or if not, do you know somebody I could ask?
Thanks again for your help!
Given that your requisites here are more specific than those of the bug,
I have made the following query for you, taking advantage that the
recentchanges talbe keeps the data from last month:
select rc_timestamp, rc_user_text, user_registration, rc_namespace,
rc_title, rc_old_len, rc_new_len, rc_new, rc_minor, rc_type,
page_is_redirect from recentchanges JOIN user on (user_id=rc_user) LEFT
JOIN page ON (page_id=rc_cur_id) where rc_timestamp > '201206000000'
and user_registration LIKE '2012%' order by rc_timestamp asc
- This will contain from June 2 to July 2.
- No bytes_diff, but you have old and new byte len :)
- It may have some non-edit log entries.
File lives at http://toolserver.org/~platonides/sandbox/martynus.txt
I also made a list of the accounts created in 2012 which were made manually:
SELECT user_id, user_name, log_timestamp AS signed_up_at FROM
user LEFT JOIN logging ON user_id = log_user WHERE log_type =
'newusers' AND log_action = 'create' AND log_timestamp >
File lives at http://toolserver.org/~platonides/sandbox/martynus2.txt
And of block actions done to users registered in 2012:
SELECT log_action, log_timestamp, log_title, user_registration FROM
logging JOIN user on (log_title=user_name) WHERE log_type = 'block'AND
log_timestamp > '20120000000000' and user_registration >
File lives at http://toolserver.org/~platonides/sandbox/martynus3.txt
I guess that will work for your needs.
PS: Rows per file: