I think I found a way,

1. From command line, I started a connection to mysql (user:root, no password) and opened the pages_logging database
$ mysql --local-infile -uroot pages_logging

2. I executed the following query:

mysql> LOAD DATA LOCAL INFILE '/path/to/pages-logging.xml' 
    -> INTO TABLE test
    -> CHARACTER SET binary
    -> LINES STARTING BY '<logitem>' TERMINATED BY '</logitem>' (@logitem)
    -> SET
    ->   id        = ExtractValue(@logitem:=CONVERT(@logitem using utf8), 'id'),
    ->   
    ->   timestamp = ExtractValue(@logitem, 'timestamp'),
    ->   type      = ExtractValue(@logitem, 'type'),
    ->   action    = ExtractValue(@logitem, 'action'),
    ->   logtitle  = ExtractValue(@logitem, 'logitem'),
    ->   user_id   = ExtractValue(@logitem, 'contributor/id'),
    ->   user_name = ExtractValue(@logitem, 'contributor/username');

that worked so far. But I see some rows being empty so I guess that the <logentry> nodes have a different syntax dependent on the type and action? I don't care to much as I just need the newuser and block actions, I just want to make sure that my assumption is correct, so that the study is not based on faulty data.

Is there a description of the pages-logging.xml syntax available somewhere, so I can double check my import script?

Thanks again for your help

-- 
Gregor Martynus

On Sunday, 10. June 2012 at 14:30, Gregor Martynus wrote:

Thank you all!

I downloaded the pages-logging.xml.gz and the data looks good!

May I ask another question? In order to analyze the data, I'd like to transform it to SQL. I've found a java and a pearl tool, but both are not made to transform the logging.xml to sql. Are you aware of another tool that I can use, or maybe an instruction to follow?

Or maybe there is another tool you can think of that I can use to analyze the information in a performant way?

Once again, thanks a lot for your help, really appreciate it.

-- 
Gregor Martynus

On Saturday, 9. June 2012 at 19:38, Platonides wrote:

On 09/06/12 17:23, Gregor Martynus wrote:
Hi,

for a dissertation study, I try to find a reliable datasource from where
I can extract user account events, specifically creation and blocking of
user accounts, with usernames, the event name and timestamps,

Is such data available? If yes, could anybody point be to where I can
get it from?

Thanks a lot

--
Gregor

You want to grab the pages-logging.xml.gz file.

Despite its name, it does contain creation (not for the very old
account) and blocking logs for accounts.