On 11/06/12 21:53, Gregor Martynus wrote:
Thanks Felipe, I'll definitely give it a try next time. One thing that puzzles me: From your code it seems there would be <namespace> tags in the pages-logging.xml dump. Is this the case, I didn't see these myself.
I've updated the type/action tree with the input by Platonides, feel free to use / extend it: https://gist.github.com/2906718
I was surprised that the pages-logging.xml dump does not contain events about user contributions. My friend is searching for
- users with first time contributions in May
- only manual sign ups
- dates when accounts have been created
and some more detailed things, but that would be the start.
For example, there is the special page "User Contributions" (http://en.wikipedia.org/wiki/Special:Contributions). Can you point me to the dump(s) I need to get this data (namespace, page title, user, diff, comment, datetime)? The pages-logging.xml is already great to find out about created / blocked user accounts, what we are missing are the actual contributions.
Does that make sense to you?
-- Gregor Martynus
Page edits appear in the article XML dumps. Special:Contributions is just a query against the revision table. The information you want is at pages-meta-history, but if you can use it (ie. you don't need the actual page content), stub-meta-history is a much smaller file.