--
Gregor Martynus
On Monday, 2. July 2012 at 23:47, Platonides wrote:
> On 02/07/12 18:45, Gregor Martynus wrote:
> > Hey there,
> >
> > I really appreciate the help of you and the others in the mailing list,
> > thanks to your help I was able to extract the data my friend needs for
> > her wikipedia study.
> >
> > While it worked as a proof of concept, that the data she needs is
> > available, it took to much time to process all needed data and she needs
> > it another 3 to 4 times.
> >
> > I contact you because you offered me to send an SQL dump back then, does
> > that mean you've access to the Wikipedia database?
> >
> > I'm not an SQL expert, but I think based on the MediaWiki database
> > schema, I could build the SQL queries for the 3 things she needs
> >
> > 1. user_accounts(name, created_at, blocked_at) created manually in 2012
> > 2. pages (namespace_id, title, is_redirect)
> > 3. revisions (page_id, user_id, created_at) created by users from 1.
> > bytes_diff would be nice, if available
> >
> > The goal is to find users that manually created an account, have not
> > been blocked and that made their first contribution in June 2012.
> >
> > Any chance you could run the queries and send me the results as sql
> > dumps? Or if not, do you know somebody I could ask?
> >
> > Thanks again for your help!
> >
> > --
> > Gregor Martynus
> >
>
>
> Given that your requisites here are more specific than those of the bug,
> I have made the following query for you, taking advantage that the
> recentchanges talbe keeps the data from last month:
> select rc_timestamp, rc_user_text, user_registration, rc_namespace,
> rc_title, rc_old_len, rc_new_len, rc_new, rc_minor, rc_type,
> page_is_redirect from recentchanges JOIN user on (user_id=rc_user) LEFT
> JOIN page ON (page_id=rc_cur_id) where rc_timestamp > '201206000000'
> and user_registration LIKE '2012%' order by rc_timestamp asc
>
>
> Some notes:
> - This will contain from June 2 to July 2.
> - No bytes_diff, but you have old and new byte len :)
> - It may have some non-edit log entries.
>
>
> File lives at
http://toolserver.org/~platonides/sandbox/martynus.txt
>
>
> I also made a list of the accounts created in 2012 which were made manually:
> SELECT user_id, user_name, log_timestamp AS signed_up_at FROM
> user LEFT JOIN logging ON user_id = log_user WHERE log_type =
> 'newusers' AND log_action = 'create' AND log_timestamp >
> '20120000000000
>
> File lives at
http://toolserver.org/~platonides/sandbox/martynus2.txt
>
>
> And of block actions done to users registered in 2012:
> SELECT log_action, log_timestamp, log_title, user_registration FROM
> logging JOIN user on (log_title=user_name) WHERE log_type = 'block'AND
> log_timestamp > '20120000000000' and user_registration > '20120000000000'
>
> File lives at
http://toolserver.org/~platonides/sandbox/martynus3.txt
>
> I guess that will work for your needs.
>
> Regards
>
>
> PS: Rows per file:
> 73426 martynus.txt
> 54429 martynus2.txt
> 2893 martynus3.txt
>
>