I think I found a way,
1. From command line, I started a connection to mysql (user:root, no password) and opened the pages_logging database
$ mysql --local-infile -uroot pages_logging
2. I executed the following query:
mysql> LOAD DATA LOCAL INFILE '/path/to/pages-logging.xml'
-> INTO TABLE test
-> CHARACTER SET binary
-> LINES STARTING BY '<logitem>' TERMINATED BY '</logitem>' (@logitem)
-> SET
-> id = ExtractValue(@logitem:=CONVERT(@logitem using utf8), 'id'),
->
-> timestamp = ExtractValue(@logitem, 'timestamp'),
-> type = ExtractValue(@logitem, 'type'),
-> action = ExtractValue(@logitem, 'action'),
-> logtitle = ExtractValue(@logitem, 'logitem'),
-> user_id = ExtractValue(@logitem, 'contributor/id'),
-> user_name = ExtractValue(@logitem, 'contributor/username');
that worked so far. But I see some rows being empty so I guess that the <logentry> nodes have a different syntax dependent on the type and action? I don't care to much as I just need the newuser and block actions, I just want to make sure that my assumption is correct, so that the study is not based on faulty data.
Is there a description of the pages-logging.xml syntax available somewhere, so I can double check my import script?
Thanks again for your help
--
Gregor Martynus
On Sunday, 10. June 2012 at 14:30, Gregor Martynus wrote:
> Thank you all!
>
> I downloaded the pages-logging.xml.gz and the data looks good!
>
> May I ask another question? In order to analyze the data, I'd like to transform it to SQL. I've found a java and a pearl tool, but both are not made to transform the logging.xml to sql. Are you aware of another tool that I can use, or maybe an instruction to follow?
>
> Or maybe there is another tool you can think of that I can use to analyze the information in a performant way?
>
> Once again, thanks a lot for your help, really appreciate it.
>
> --
> Gregor Martynus
>
>
> On Saturday, 9. June 2012 at 19:38, Platonides wrote:
>
> > On 09/06/12 17:23, Gregor Martynus wrote:
> > > Hi,
> > >
> > > for a dissertation study, I try to find a reliable datasource from where
> > > I can extract user account events, specifically creation and blocking of
> > > user accounts, with usernames, the event name and timestamps,
> > >
> > > Is such data available? If yes, could anybody point be to where I can
> > > get it from?
> > >
> > > Thanks a lot
> > >
> > > --
> > > Gregor
> > >
> >
> >
> > Yes. Go to
http://dumps.wikimedia.org/
> > You want to grab the pages-logging.xml.gz file.
> >
> > Despite its name, it does contain creation (not for the very old
> > account) and blocking logs for accounts.
> >
> >
> >
>
>