Revision: 6741 Author: nicdumz Date: 2009-04-27 16:33:38 +0000 (Mon, 27 Apr 2009)
Log Message: ----------- Another nice patch by Johan Euphrosine: * Adding support for comment parsing from XML * And a set of unit tests to test xmlreader
Modified Paths: -------------- trunk/pywikipedia/xmlreader.py
Added Paths: ----------- trunk/pywikipedia/tests/ trunk/pywikipedia/tests/article-pear.xml trunk/pywikipedia/tests/test-xmlreader.py
Added: trunk/pywikipedia/tests/article-pear.xml =================================================================== --- trunk/pywikipedia/tests/article-pear.xml (rev 0) +++ trunk/pywikipedia/tests/article-pear.xml 2009-04-27 16:33:38 UTC (rev 6741) @@ -0,0 +1,109 @@ +<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en"> + <siteinfo> + <sitename>Wikipedia</sitename> + <base>http://en.wikipedia.org/wiki/Main_Page</base> + <generator>MediaWiki 1.15alpha</generator> + <case>first-letter</case> + <namespaces> + <namespace key="-2">Media</namespace> + <namespace key="-1">Special</namespace> + <namespace key="0" /> + <namespace key="1">Talk</namespace> + <namespace key="2">User</namespace> + <namespace key="3">User talk</namespace> + <namespace key="4">Wikipedia</namespace> + <namespace key="5">Wikipedia talk</namespace> + <namespace key="6">File</namespace> + <namespace key="7">File talk</namespace> + <namespace key="8">MediaWiki</namespace> + <namespace key="9">MediaWiki talk</namespace> + <namespace key="10">Template</namespace> + <namespace key="11">Template talk</namespace> + <namespace key="12">Help</namespace> + <namespace key="13">Help talk</namespace> + <namespace key="14">Category</namespace> + <namespace key="15">Category talk</namespace> + <namespace key="100">Portal</namespace> + <namespace key="101">Portal talk</namespace> + </namespaces> + </siteinfo> + <page> + <title>Pear</title> + <id>24278</id> + <revision> + <id>185185</id> + <timestamp>2002-02-25T15:43:11Z</timestamp> + <contributor> + <ip>Conversion script</ip> + </contributor> + <minor/> + <comment>Automated conversion</comment> + <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] <em>Pyrus</em> and the edible [[fruit]] of that tree. +The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. + +There are many species of pears. The most important for fruit production are <em>Pyrus communis</em> (European pear or simply pear) and <em>Pyrus pyrifolia</em> (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. + +Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. + +Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. +</text> + </revision> + <revision> + <id>185241</id> + <timestamp>2002-08-31T02:16:06Z</timestamp> + <contributor> + <username>Quercusrobur</username> + <id>3741</id> + </contributor> + <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] <em>Pyrus</em> and the edible [[fruit]] of that tree. +The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. + +There are many species of pears. The most important for fruit production are <em>Pyrus communis</em> (European pear or simply pear) and <em>Pyrus pyrifolia</em> (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. + +Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. + +Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. + +[[propagating apples and other fruit trees]]</text> + </revision> + <revision> + <id>185408</id> + <timestamp>2002-08-31T03:27:15Z</timestamp> + <contributor> + <username>Mav</username> + <id>62</id> + </contributor> + <minor/> + <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] <em>Pyrus</em> and the edible [[fruit]] of that tree. +The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. + +There are many species of pears. The most important for fruit production are <em>Pyrus communis</em> (European pear or simply pear) and <em>Pyrus pyrifolia</em> (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. + +Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. + +Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. + +[[Fruit tree propogation]]</text> + </revision> + <revision> + <id>188924</id> + <timestamp>2002-08-31T05:53:10Z</timestamp> + <contributor> + <username>PierreAbbat</username> + <id>1123</id> + </contributor> + <minor/> + <comment>sp</comment> + <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] <em>Pyrus</em> and the edible [[fruit]] of that tree. +The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. + +There are many species of pears. The most important for fruit production are <em>Pyrus communis</em> (European pear or simply pear) and <em>Pyrus pyrifolia</em> (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. + +Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. + +Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. + +[[Fruit tree propagation]]</text> + </revision> + </page> +</mediawiki>
Added: trunk/pywikipedia/tests/test-xmlreader.py =================================================================== --- trunk/pywikipedia/tests/test-xmlreader.py (rev 0) +++ trunk/pywikipedia/tests/test-xmlreader.py 2009-04-27 16:33:38 UTC (rev 6741) @@ -0,0 +1,26 @@ +import unittest +import xml.sax + +import sys +# get the xmlreader module one level under +sys.path.append('..') + +import xmlreader + +class XmlReaderTestCase(unittest.TestCase): + def test_XmlDump(self): + pages = [r for r in xmlreader.XmlDump("article-pear.xml", allrevisions=True).parse()] + self.assertEquals(4, len(pages)) + self.assertNotEquals("", pages[0].comment) + def test_MediaWikiXmlHandler(self): + handler = xmlreader.MediaWikiXmlHandler() + pages = [] + def pageDone(page): + pages.append(page) + handler.setCallback(pageDone) + xml.sax.parse("article-pear.xml", handler) + self.assertEquals(4, len(pages)) + self.assertNotEquals("", pages[0].comment) + +if __name__ == '__main__': + unittest.main()
Property changes on: trunk/pywikipedia/tests/test-xmlreader.py ___________________________________________________________________ Added: svn:eol-style + native
Modified: trunk/pywikipedia/xmlreader.py =================================================================== --- trunk/pywikipedia/xmlreader.py 2009-04-27 15:41:56 UTC (rev 6740) +++ trunk/pywikipedia/xmlreader.py 2009-04-27 16:33:38 UTC (rev 6741) @@ -56,7 +56,7 @@ """ Represents a page. """ - def __init__(self, title, id, text, username, ipedit, timestamp, editRestriction, moveRestriction, revisionid): + def __init__(self, title, id, text, username, ipedit, timestamp, editRestriction, moveRestriction, revisionid, comment): # TODO: there are more tags we can read. self.title = title self.id = id @@ -67,6 +67,7 @@ self.editRestriction = editRestriction self.moveRestriction = moveRestriction self.revisionid = revisionid + self.comment = comment
class XmlHeaderEntry: @@ -130,6 +131,9 @@ self.destination = 'username' # store it in the username self.username = u'' self.ipedit = True + elif name == 'comment': + self.destination = 'comment' + self.comment = u'' elif name == 'restrictions': self.destination = 'restrictions' self.restrictions = u'' @@ -171,7 +175,11 @@ self.timestamp[17:19]) self.title = self.title.strip() # Report back to the caller - entry = XmlEntry(self.title, self.id, text, self.username, self.ipedit, timestamp, self.editRestriction, self.moveRestriction, self.revisionid) + entry = XmlEntry(self.title, self.id, + text, self.username, + self.ipedit, timestamp, + self.editRestriction, self.moveRestriction, + self.revisionid, self.comment) self.inRevisionTag = False self.callback(entry) elif self.headercallback: @@ -191,6 +199,8 @@ self.id += data elif self.destination == 'revisionid': self.revisionid += data + elif self.destination == 'comment': + self.comment += data elif self.destination == 'restrictions': self.restrictions += data elif self.destination == 'title': @@ -309,6 +319,7 @@ """Creates a Single revision""" revisionid = revision.findtext("{%s}id" % self.uri) timestamp = revision.findtext("{%s}timestamp" % self.uri) + comment = revision.findtext("{%s}comment" % self.uri) contributor = revision.find("{%s}contributor" % self.uri) ipeditor = contributor.findtext("{%s}ip" % self.uri) username = ipeditor or contributor.findtext("{%s}username" % self.uri) @@ -321,7 +332,8 @@ timestamp=timestamp, editRestriction=editRestriction, moveRestriction=moveRestriction, - revisionid=revisionid + revisionid=revisionid, + comment=comment )
def regex_parse(self):
pywikipedia-svn@lists.wikimedia.org