Patches item #2800449, was opened at 2009-06-03 13:28 Message generated for change (Settings changed) made by xqt You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2800449...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None
Status: Pending
Resolution: None Priority: 5 Private: No Submitted By: Hannes Röst (hroest) Assigned to: Nobody/Anonymous (nobody) Summary: read more tags from xmldump
Initial Comment: I changed the classes XmlDump and XmlEntry so that they now also have information about namespaces and minor edits. I parse the information of the xml dump header with the function _parseSiteinfo and I added the new field siteinfo to XmlDump as well as the fields: "sitename, base, generator, case, namespaces".
----------------------------------------------------------------------
Comment By: siebrand (siebrand) Date: 2009-09-25 02:06
Message: Cannot process diff.
----------------------------------------------------------------------
Comment By: Hannes Röst (hroest) Date: 2009-06-03 14:19
Message: ok, I used diff -u, is this better? sry, I haven't done this before, greetings hroest
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto) Date: 2009-06-03 14:06
Message: Please use "diff -u" or "svn diff" to create patches. Unified diffs are more readable and thus easier to review
----------------------------------------------------------------------
Comment By: Hannes Röst (hroest) Date: 2009-06-03 13:36
Message: diff: 59c59 < def __init__(self, title, id, text, username, ipedit, timestamp, editRestriction, moveRestriction, revisionid, comment): ---
def __init__(self, title, id, text, username, ipedit, timestamp,
editRestriction, moveRestriction, revisionid, comment, minor, namespace): 70a71,72
self.minor = minor self.namespace = namespace
284a287
self.siteinfo = None
292a296,297
if event == 'start' and elem.tag == "{%s}siteinfo" %
self.uri:
self._parseSiteinfo(elem)
295a301,311
def _parseSiteinfo(self, elem): self.sitename = elem.findtext( "{%s}sitename" % self.uri ) self.base = elem.findtext( "{%s}base" % self.uri ) self.generator = elem.findtext( "{%s}generator" % self.uri ) self.case = elem.findtext( "{%s}case" % self.uri ) self.namespaces = {} for infoElement in elem: if infoElement.tag == "{%s}namespaces" % self.uri: for name in infoElement: self.namespaces[name.text] = name.attrib['key']
327c343,344 < # could get comment, minor as well ---
if revision.findtext("{%s}minor" % self.uri) == '': minor =
True
else: minor = False
330a348,359
#here we get the namespace which is in a format like this
"ns:title"
#note that we can find namespace zero in the dictionary under
"None"
match = re.search('([^:]*):\w*', self.title) try: if match: nameSp = self.namespaces[match.group(1)] else: nameSp = self.namespaces[match] except KeyError: #this means we dont have this one stored as a namespace or
its an
#article like "2001: A Space Odyssey (film)" #we assume that the namespace is zero nameSp = 0
337c366,368 < comment=comment ---
comment=comment, minor = minor, namespace = nameSp
407d437 <
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603140&aid=2800449...
pywikipedia-bugs@lists.wikimedia.org