Patches item #2800449, was opened at 2009-06-03 13:28
Message generated for change (Comment added) made by hroest
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=280044…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hannes Röst (hroest)
Assigned to: Nobody/Anonymous (nobody)
Summary: read more tags from xmldump
Initial Comment:
I changed the classes XmlDump and XmlEntry so that they now also have information about
namespaces and minor edits.
I parse the information of the xml dump header with the function _parseSiteinfo and I
added the new field siteinfo to XmlDump as well as the fields: "sitename, base,
generator, case, namespaces".
----------------------------------------------------------------------
Comment By: Hannes Röst (hroest)
Date: 2009-06-03
14:19
Message:
ok, I used diff -u, is this better?
sry, I haven't done this before, greetings hroest
----------------------------------------------------------------------
Comment By: Francesco Cosoleto (cosoleto)
Date: 2009-06-03 14:06
Message:
Please use "diff -u" or "svn diff" to create patches. Unified diffs
are
more readable and thus easier to review
----------------------------------------------------------------------
Comment By: Hannes Röst (hroest)
Date: 2009-06-03 13:36
Message:
diff:
59c59
< def __init__(self, title, id, text, username, ipedit, timestamp,
editRestriction, moveRestriction, revisionid, comment):
---
def __init__(self, title, id, text, username,
ipedit, timestamp,
editRestriction, moveRestriction, revisionid, comment, minor,
namespace):
70a71,72
self.minor = minor
self.namespace = namespace
284a287
self.siteinfo = None
292a296,297
if event == 'start' and elem.tag
== "{%s}siteinfo" %
self.uri:
self._parseSiteinfo(elem)
295a301,311
def _parseSiteinfo(self, elem):
self.sitename = elem.findtext( "{%s}sitename" % self.uri )
self.base = elem.findtext( "{%s}base" % self.uri )
self.generator = elem.findtext( "{%s}generator" % self.uri )
self.case = elem.findtext( "{%s}case" % self.uri )
self.namespaces = {}
for infoElement in elem:
if infoElement.tag == "{%s}namespaces" % self.uri:
for name in infoElement:
self.namespaces[name.text] = name.attrib['key']
327c343,344
< # could get comment, minor as well
---
if revision.findtext("{%s}minor" %
self.uri) == '': minor =
True
else: minor = False
330a348,359
#here we get the namespace which is in a format like this
"ns:title"
#note that we can find namespace zero in the
dictionary under
"None"
match = re.search('([^:]*):\w*',
self.title)
try:
if match: nameSp = self.namespaces[match.group(1)]
else: nameSp = self.namespaces[match]
except KeyError:
#this means we dont have this one stored as a namespace or
its an
#article like "2001: A Space Odyssey
(film)"
#we assume that the namespace is zero
nameSp = 0
337c366,368
< comment=comment
---
comment=comment,
minor = minor,
namespace = nameSp
407d437
<
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603140&aid=280044…