Thanks again for your input, sounds like the Stub-meta-history dump is exactly what we need. I'm already downloading it.

I'm not sure if this is the place for such an suggestion, but it would be great to have example versions of the real dumps, with only a few hundred entries each, just to find out if they fit specific requirements, without the need of downloading sever GB of data. Just a thought.

-- 
Gregor Martynus

On Monday, 11. June 2012 at 23:19, Felipe Ortega wrote:

De: Platonides <platonides@gmail.com>
Para: Gregor Martynus <gregor@martynus.net>
Enviado: Lunes 11 de junio de 2012 22:38
Asunto: Re: [Xmldatadumps-l] anonymous user account logs (account created / account blocked)

On 11/06/12 21:53, Gregor Martynus wrote:
Thanks Felipe, I'll definitely give it a try next time. One thing that
puzzles me:
From your code it seems there would be <namespace> tags in the
pages-logging.xml dump. Is this the case, I didn't see these myself.

Ops. That may be remaining after from copy-paste from the parser skeleton for revision table  dumps. The new version won't have that,
definitely. I'll fix that source file.

I've updated the type/action tree with the input by Platonides, feel
free to use / extend it:

Great, thanks.

I was surprised that the pages-logging.xml dump does not contain events
about user contributions. My friend is searching for

- users with first time contributions in May
- only manual sign ups
- dates when accounts have been created

and some more detailed things, but that would be the start.

For example, there is the special page "User Contributions"
to the dump(s) I need to get this data (namespace, page title, user,
diff, comment, datetime)? The pages-logging.xml is already great to find
out about created / blocked user accounts, what we are missing are the
actual contributions.

Does that make sense to you?

--
Gregor Martynus

Page edits appear in the article XML dumps.
Special:Contributions is just a query against the revision table.
The information you want is at pages-meta-history, but if you can use it
(ie. you don't need the actual page content), stub-meta-history is a
much smaller file.

Indeed. For that purpose, you must join information from revision, page and logging tables. As Platonides has suggested, you have 2 options:

- Stub-meta-history: All meta information about revision and page tables, but no text.
- Pages-meta-history: Same as before plus complete text for every revision in each wiki page (all namespaces). Keep in mind that you have the whole text, no diffs, for every change. That's why these files can be huge once you decompress them (x100 times larger, sometimes even more).

Please, also be careful with the 'diff' tool, as sometimes it cannot track changes between different versions accurately (it depends on which granularity you demand).

Cheers,
Felipe.

----- Mensaje original -----