Yeah, that makes sense, thanks Felipe!

-- 
Gregor Martynus

On Tuesday, 12. June 2012 at 23:50, Felipe Ortega wrote:


________________________________
De: Gregor Martynus <gregor@martynus.net>
Para: Felipe Ortega <glimmer_phoenix@yahoo.es>
Enviado: Martes 12 de junio de 2012 19:24
Asunto: Re: [Xmldatadumps-l] anonymous user account logs (account created / account blocked)


Thanks Felipe, I've seen that to in the header of the pages-logging.xml dump, but I don't see a reference within the <logitem> entries. Or am I missing something?


For example the stub-meta-history.xml dump has these: "<ns>0</ns>". There is no such thing in the pages-logging.xml, is it? It's not a big problem though, I think you can get the namespace out of the logtitle if you have to

Actually, there is a section in my code in which I use the namespace name to create a similar numerical identifier for each log item. Thus, it can reproduce the same approach used in the revision table (though this field is not present in the logging table at first).

For this, you can use the text in the <logtitle> item. This is the title of the page affected by the log action. If there is a prefix such as:

Talk:Something
User:FooBar


You can match the prefix with the namespace string and insert the code in the DB. In case that the title doesn't have any prefix, the page belongs to the main namespace (articles, so ns = 0).

This extra info can be very useful when filtering log actions by type of article in which they were applied. You can see in my customized definition of the "logging" table that I have added some extra fields, including this <ns> id.


Cheers,

Felipe.



-- 
Gregor Martynus



On Tuesday, 12. June 2012 at 16:31, Felipe Ortega wrote:
________________________________
De: Gregor Martynus <gregor@martynus.net>
Para: Felipe Ortega <glimmer_phoenix@yahoo.es>
Enviado: Lunes 11 de junio de 2012 21:53
Asunto: Re: [Xmldatadumps-l] anonymous user account logs (account created / account blocked)




Thanks Felipe, I'll definitely give it a try next time. One thing that puzzles me: 
From your code it seems there would be <namespace> tags in the pages-logging.xml dump. Is this the case, I didn't see these myself.


Hi again, Gregor.


I' ve just checked with an excerpt from simplewiki-pages-logging.xml and it's not an error. Namespace info is also included, since it is part of the <siteinfo> item in XML dumps:


<siteinfo>
    <sitename>Wikipedia</sitename>
    <generator>MediaWiki 1.20wmf4</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
      <namespace key="-1" case="first-letter">Special</namespace>
      <namespace key="0" case="first-letter" />
      <namespace key="1" case="first-letter">Talk</namespace>
      <namespace key="2" case="first-letter">User</namespace>
      <namespace key="3" case="first-letter">User talk</namespace>
      <namespace key="4" case="first-letter">Wikipedia</namespace>
      <namespace key="5" case="first-letter">Wikipedia talk</namespace>
      <namespace key="6" case="first-letter">File</namespace>
      <namespace key="7" case="first-letter">File talk</namespace>
      <namespace key="8" case="first-letter">MediaWiki</namespace>
      <namespace key="9" case="first-letter">MediaWiki talk</namespace>
      <namespace key="10" case="first-letter">Template</namespace>
      <namespace key="11" case="first-letter">Template talk</namespace>
      <namespace key="12" case="first-letter">Help</namespace>
      <namespace key="13" case="first-letter">Help talk</namespace>
      <namespace key="14" case="first-letter">Category</namespace>
      <namespace key="15" case="first-letter">Category talk</namespace>
    </namespaces>
  </siteinfo>


Best,
Felipe.