[ pywikipediabot-Patches-2800449 ] read more tags from xmldump - Pywikipedia-bugs

3 Jun 2009

Patches item #2800449, was opened at 2009-06-03 13:28
Message generated for change (Comment added) made by hroest
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=280044…

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hannes Röst (hroest)
Assigned to: Nobody/Anonymous (nobody)
Summary: read more tags from xmldump

Initial Comment:
I changed the classes XmlDump and XmlEntry so that they now also have information about
namespaces and minor edits.
I parse the information of the xml dump header with the function _parseSiteinfo and I
added the new field siteinfo to XmlDump as well as the fields: "sitename, base,
generator, case, namespaces". 

----------------------------------------------------------------------

...
 Comment By: Hannes Röst (hroest) Date: 2009-06-03
14:19

Message:
ok, I used diff -u, is this better? 
sry, I haven't done this before, greetings hroest

----------------------------------------------------------------------

Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-06-03 14:06

Message:
Please use "diff -u" or "svn diff" to create patches. Unified diffs
are
more readable and thus easier to review

----------------------------------------------------------------------

Comment By: Hannes Röst (hroest)
Date: 2009-06-03 13:36

Message:
diff:  
59c59
<     def __init__(self, title, id, text, username, ipedit, timestamp,
editRestriction, moveRestriction, revisionid, comment):
---
...
      def __init__(self, title, id, text, username,
ipedit, timestamp, editRestriction, moveRestriction, revisionid, comment, minor,
namespace):
70a71,72
...
          self.minor = minor
         self.namespace = namespace 284a287
...
          self.siteinfo = None 292a296,297
...
              if event == 'start' and elem.tag
== "{%s}siteinfo" % self.uri: 
...
                  self._parseSiteinfo(elem)
295a301,311
...
      def _parseSiteinfo(self, elem):
         self.sitename = elem.findtext( "{%s}sitename" % self.uri )
         self.base = elem.findtext( "{%s}base" % self.uri )
         self.generator = elem.findtext( "{%s}generator" % self.uri )
         self.case = elem.findtext( "{%s}case" % self.uri )
         self.namespaces = {}
         for infoElement in elem:
             if infoElement.tag == "{%s}namespaces" % self.uri:
                 for name in infoElement:
                     self.namespaces[name.text] = name.attrib['key']
              327c343,344
<         # could get comment, minor as well
---
...
          if revision.findtext("{%s}minor" %
self.uri) == '': minor = True
...
          else: minor = False 330a348,359
...

         #here we get the namespace which is in a format like this
"ns:title"
...
          #note that we can find namespace zero in the
dictionary under "None"
...
          match = re.search('([^:]*):\w*',
self.title)
         try: 
             if match: nameSp = self.namespaces[match.group(1)]
             else: nameSp = self.namespaces[match]
         except KeyError:
             #this means we dont have this one stored as a namespace or its an 
...
              #article like "2001: A Space Odyssey
(film)"
             #we assume that the namespace is zero
             nameSp = 0 337c366,368
<                        comment=comment
---
...
                         comment=comment,
                        minor = minor,
                        namespace = nameSp 407d437
< 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=280044…