[ pywikipediabot-Patches-2800449 ] read more tags from xmldump - Pywikipedia-bugs

8 Oct 2009


      Patches item #2800449, was opened at 2009-06-03 13:28
Message generated for change (Settings changed) made by xqt
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2800449...
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
...
Status: Pending
Resolution: None
Priority: 5
Private: No
Submitted By: Hannes Röst (hroest)
Assigned to: Nobody/Anonymous (nobody)
Summary: read more tags from xmldump
Initial Comment:
I changed the classes XmlDump and XmlEntry so that they now also have information about namespaces and minor edits.
I parse the information of the xml dump header with the function _parseSiteinfo and I added the new field siteinfo to XmlDump as well as the fields: "sitename, base, generator, case, namespaces".
----------------------------------------------------------------------
Comment By: siebrand (siebrand)
Date: 2009-09-25 02:06
Message:
Cannot process diff.
----------------------------------------------------------------------
Comment By: Hannes Röst (hroest)
Date: 2009-06-03 14:19
Message:
ok, I used diff -u, is this better? 
sry, I haven't done this before, greetings hroest
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-06-03 14:06
Message:
Please use "diff -u" or "svn diff" to create patches. Unified diffs are
more readable and thus easier to review
----------------------------------------------------------------------
Comment By: Hannes Röst (hroest)
Date: 2009-06-03 13:36
Message:
diff:  
59c59
<     def __init__(self, title, id, text, username, ipedit, timestamp,
editRestriction, moveRestriction, revisionid, comment):
---
...
def __init__(self, title, id, text, username, ipedit, timestamp,

editRestriction, moveRestriction, revisionid, comment, minor, namespace):
70a71,72
...
    self.minor = minor
    self.namespace = namespace

284a287
...
    self.siteinfo = None

292a296,297
...
        if event == 'start' and elem.tag == "{%s}siteinfo" %

self.uri:
...
            self._parseSiteinfo(elem)

295a301,311
...
def _parseSiteinfo(self, elem):
    self.sitename = elem.findtext( "{%s}sitename" % self.uri )
    self.base = elem.findtext( "{%s}base" % self.uri )
    self.generator = elem.findtext( "{%s}generator" % self.uri )
    self.case = elem.findtext( "{%s}case" % self.uri )
    self.namespaces = {}
    for infoElement in elem:
        if infoElement.tag == "{%s}namespaces" % self.uri:
            for name in infoElement:
                self.namespaces[name.text] = name.attrib['key']


327c343,344
<         # could get comment, minor as well
---
...
    if revision.findtext("{%s}minor" % self.uri) == '': minor =

True
...
    else: minor = False

330a348,359
...
    #here we get the namespace which is in a format like this

"ns:title"
...
    #note that we can find namespace zero in the dictionary under

"None"
...
    match = re.search('([^:]*):\w*', self.title)
    try: 
        if match: nameSp = self.namespaces[match.group(1)]
        else: nameSp = self.namespaces[match]
    except KeyError:
        #this means we dont have this one stored as a namespace or

its an
...
        #article like "2001: A Space Odyssey (film)"
        #we assume that the namespace is zero
        nameSp = 0

337c366,368
<                        comment=comment
---
...
                   comment=comment,
                   minor = minor,
                   namespace = nameSp

407d437
<
----------------------------------------------------------------------
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2800449...