On Monday, 11. June 2012 at 23:19, Felipe Ortega wrote:
De: Platonides <platonides@gmail.com>Para: Gregor Martynus <gregor@martynus.net>CC: Felipe Ortega <glimmer_phoenix@yahoo.es>; "xmldatadumps-l@lists.wikimedia.org" <xmldatadumps-l@lists.wikimedia.org>Enviado: Lunes 11 de junio de 2012 22:38Asunto: Re: [Xmldatadumps-l] anonymous user account logs (account created / account blocked)On 11/06/12 21:53, Gregor Martynus wrote:Thanks Felipe, I'll definitely give it a try next time. One thing thatpuzzles me:From your code it seems there would be <namespace> tags in thepages-logging.xml dump. Is this the case, I didn't see these myself.Ops. That may be remaining after from copy-paste from the parser skeleton for revision table dumps. The new version won't have that,definitely. I'll fix that source file.I've updated the type/action tree with the input by Platonides, feelfree to use / extend it:Great, thanks.I was surprised that the pages-logging.xml dump does not contain eventsabout user contributions. My friend is searching for- users with first time contributions in May- only manual sign ups- dates when accounts have been createdand some more detailed things, but that would be the start.For example, there is the special page "User Contributions"(http://en.wikipedia.org/wiki/Special:Contributions). Can you point meto the dump(s) I need to get this data (namespace, page title, user,diff, comment, datetime)? The pages-logging.xml is already great to findout about created / blocked user accounts, what we are missing are theactual contributions.Does that make sense to you?--Gregor MartynusPage edits appear in the article XML dumps.Special:Contributions is just a query against the revision table.The information you want is at pages-meta-history, but if you can use it(ie. you don't need the actual page content), stub-meta-history is amuch smaller file.Indeed. For that purpose, you must join information from revision, page and logging tables. As Platonides has suggested, you have 2 options:- Stub-meta-history: All meta information about revision and page tables, but no text.- Pages-meta-history: Same as before plus complete text for every revision in each wiki page (all namespaces). Keep in mind that you have the whole text, no diffs, for every change. That's why these files can be huge once you decompress them (x100 times larger, sometimes even more).Please, also be careful with the 'diff' tool, as sometimes it cannot track changes between different versions accurately (it depends on which granularity you demand).Cheers,Felipe.----- Mensaje original -----