Pywikipedia-svn

pywikipedia-svn@lists.wikimedia.org

5163 discussions

SVN: [6751] trunk/pywikipedia/wikipedia.py
by cosoleto＠svn.wikimedia.org 29 Apr '09

29 Apr '09

Revision: 6751 Author: cosoleto Date: 2009-04-29 16:21:41 +0000 (Wed, 29 Apr 2009) Log Message: ----------- Don't import unused parts of BeautifulSoup module. Modified Paths: -------------- trunk/pywikipedia/wikipedia.py Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2009-04-29 15:58:10 UTC (rev 6750) +++ trunk/pywikipedia/wikipedia.py 2009-04-29 16:21:41 UTC (rev 6751) @@ -137,7 +137,7 @@ import warnings import unicodedata import xmlreader -from BeautifulSoup import * +from BeautifulSoup import BeautifulSoup, BeautifulStoneSoup, SoupStrainer import weakref # Set the locale to system default. This will ensure correct string

1 0

SVN: [6750] trunk/pywikipedia/tests
by cosoleto＠svn.wikimedia.org 29 Apr '09

29 Apr '09

Revision: 6750 Author: cosoleto Date: 2009-04-29 15:58:10 +0000 (Wed, 29 Apr 2009) Log Message: ----------- Created a 'data' directory for files used from test cases. Moved article-peer.xml and modified test-xmlreader.py accordingly. Modified Paths: -------------- trunk/pywikipedia/tests/test-xmlreader.py Added Paths: ----------- trunk/pywikipedia/tests/data/ trunk/pywikipedia/tests/data/article-pear.xml Removed Paths: ------------- trunk/pywikipedia/tests/article-pear.xml Deleted: trunk/pywikipedia/tests/article-pear.xml =================================================================== --- trunk/pywikipedia/tests/article-pear.xml 2009-04-29 13:05:37 UTC (rev 6749) +++ trunk/pywikipedia/tests/article-pear.xml 2009-04-29 15:58:10 UTC (rev 6750) @@ -1,109 +0,0 @@ -<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en"> - <siteinfo> - <sitename>Wikipedia</sitename> - <base>http://en.wikipedia.org/wiki/Main_Page</base> - <generator>MediaWiki 1.15alpha</generator> - <case>first-letter</case> - <namespaces> - <namespace key="-2">Media</namespace> - <namespace key="-1">Special</namespace> - <namespace key="0" /> - <namespace key="1">Talk</namespace> - <namespace key="2">User</namespace> - <namespace key="3">User talk</namespace> - <namespace key="4">Wikipedia</namespace> - <namespace key="5">Wikipedia talk</namespace> - <namespace key="6">File</namespace> - <namespace key="7">File talk</namespace> - <namespace key="8">MediaWiki</namespace> - <namespace key="9">MediaWiki talk</namespace> - <namespace key="10">Template</namespace> - <namespace key="11">Template talk</namespace> - <namespace key="12">Help</namespace> - <namespace key="13">Help talk</namespace> - <namespace key="14">Category</namespace> - <namespace key="15">Category talk</namespace> - <namespace key="100">Portal</namespace> - <namespace key="101">Portal talk</namespace> - </namespaces> - </siteinfo> - <page> - <title>Pear</title> - <id>24278</id> - <revision> - <id>185185</id> - <timestamp>2002-02-25T15:43:11Z</timestamp> - <contributor> - <ip>Conversion script</ip> - </contributor> - <minor/> - <comment>Automated conversion</comment> - <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] Pyrus and the edible [[fruit]] of that tree. -The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. - -There are many species of pears. The most important for fruit production are Pyrus communis (European pear or simply pear) and Pyrus pyrifolia (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. - -Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. - -Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. -</text> - </revision> - <revision> - <id>185241</id> - <timestamp>2002-08-31T02:16:06Z</timestamp> - <contributor> - <username>Quercusrobur</username> - <id>3741</id> - </contributor> - <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] Pyrus and the edible [[fruit]] of that tree. -The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. - -There are many species of pears. The most important for fruit production are Pyrus communis (European pear or simply pear) and Pyrus pyrifolia (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. - -Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. - -Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. - -[[propagating apples and other fruit trees]]</text> - </revision> - <revision> - <id>185408</id> - <timestamp>2002-08-31T03:27:15Z</timestamp> - <contributor> - <username>Mav</username> - <id>62</id> - </contributor> - <minor/> - <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] Pyrus and the edible [[fruit]] of that tree. -The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. - -There are many species of pears. The most important for fruit production are Pyrus communis (European pear or simply pear) and Pyrus pyrifolia (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. - -Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. - -Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. - -[[Fruit tree propogation]]</text> - </revision> - <revision> - <id>188924</id> - <timestamp>2002-08-31T05:53:10Z</timestamp> - <contributor> - <username>PierreAbbat</username> - <id>1123</id> - </contributor> - <minor/> - <comment>sp</comment> - <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] Pyrus and the edible [[fruit]] of that tree. -The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. - -There are many species of pears. The most important for fruit production are Pyrus communis (European pear or simply pear) and Pyrus pyrifolia (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. - -Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. - -Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. - -[[Fruit tree propagation]]</text> - </revision> - </page> -</mediawiki> Copied: trunk/pywikipedia/tests/data/article-pear.xml (from rev 6747, trunk/pywikipedia/tests/article-pear.xml) =================================================================== --- trunk/pywikipedia/tests/data/article-pear.xml (rev 0) +++ trunk/pywikipedia/tests/data/article-pear.xml 2009-04-29 15:58:10 UTC (rev 6750) @@ -0,0 +1,109 @@ +<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.3/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en"> + <siteinfo> + <sitename>Wikipedia</sitename> + <base>http://en.wikipedia.org/wiki/Main_Page</base> + <generator>MediaWiki 1.15alpha</generator> + <case>first-letter</case> + <namespaces> + <namespace key="-2">Media</namespace> + <namespace key="-1">Special</namespace> + <namespace key="0" /> + <namespace key="1">Talk</namespace> + <namespace key="2">User</namespace> + <namespace key="3">User talk</namespace> + <namespace key="4">Wikipedia</namespace> + <namespace key="5">Wikipedia talk</namespace> + <namespace key="6">File</namespace> + <namespace key="7">File talk</namespace> + <namespace key="8">MediaWiki</namespace> + <namespace key="9">MediaWiki talk</namespace> + <namespace key="10">Template</namespace> + <namespace key="11">Template talk</namespace> + <namespace key="12">Help</namespace> + <namespace key="13">Help talk</namespace> + <namespace key="14">Category</namespace> + <namespace key="15">Category talk</namespace> + <namespace key="100">Portal</namespace> + <namespace key="101">Portal talk</namespace> + </namespaces> + </siteinfo> + <page> + <title>Pear</title> + <id>24278</id> + <revision> + <id>185185</id> + <timestamp>2002-02-25T15:43:11Z</timestamp> + <contributor> + <ip>Conversion script</ip> + </contributor> + <minor/> + <comment>Automated conversion</comment> + <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] Pyrus and the edible [[fruit]] of that tree. +The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. + +There are many species of pears. The most important for fruit production are Pyrus communis (European pear or simply pear) and Pyrus pyrifolia (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. + +Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. + +Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. +</text> + </revision> + <revision> + <id>185241</id> + <timestamp>2002-08-31T02:16:06Z</timestamp> + <contributor> + <username>Quercusrobur</username> + <id>3741</id> + </contributor> + <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] Pyrus and the edible [[fruit]] of that tree. +The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. + +There are many species of pears. The most important for fruit production are Pyrus communis (European pear or simply pear) and Pyrus pyrifolia (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. + +Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. + +Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. + +[[propagating apples and other fruit trees]]</text> + </revision> + <revision> + <id>185408</id> + <timestamp>2002-08-31T03:27:15Z</timestamp> + <contributor> + <username>Mav</username> + <id>62</id> + </contributor> + <minor/> + <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] Pyrus and the edible [[fruit]] of that tree. +The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. + +There are many species of pears. The most important for fruit production are Pyrus communis (European pear or simply pear) and Pyrus pyrifolia (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. + +Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. + +Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. + +[[Fruit tree propogation]]</text> + </revision> + <revision> + <id>188924</id> + <timestamp>2002-08-31T05:53:10Z</timestamp> + <contributor> + <username>PierreAbbat</username> + <id>1123</id> + </contributor> + <minor/> + <comment>sp</comment> + <text xml:space="preserve">Pears are [[tree]]s of the [[genus]] Pyrus and the edible [[fruit]] of that tree. +The pear is an important fruit in temperate regions. Like the [[apple]], the pear fruit is a [[pome]]. There are thousands of domesticated pear varieties. + +There are many species of pears. The most important for fruit production are Pyrus communis (European pear or simply pear) and Pyrus pyrifolia (Asian pear or apple pear). Other species are used as rootstocks for European and Asian pears and as ornamental trees. + +Unlike most fruits, European pears do not ripen on the tree. They must be picked and, sometimes, subjected to cold, before they will become sweet and soft. They store well in their mature but unripe state if kept cold. Asian pears are sweet on the tree and are eaten crisp. + +Pears are consumed fresh, canned, and as juice. Fermented pear juice is called [[perry]]. + +[[Fruit tree propagation]]</text> + </revision> + </page> +</mediawiki> Modified: trunk/pywikipedia/tests/test-xmlreader.py =================================================================== --- trunk/pywikipedia/tests/test-xmlreader.py 2009-04-29 13:05:37 UTC (rev 6749) +++ trunk/pywikipedia/tests/test-xmlreader.py 2009-04-29 15:58:10 UTC (rev 6750) @@ -9,12 +9,12 @@ class XmlReaderTestCase(unittest.TestCase): def test_XmlDumpAllRevs(self): - pages = [r for r in xmlreader.XmlDump("article-pear.xml", allrevisions=True).parse()] + pages = [r for r in xmlreader.XmlDump("data/article-pear.xml", allrevisions=True).parse()] self.assertEquals(4, len(pages)) self.assertNotEquals("", pages[0].comment) def test_XmlDumpFirstRev(self): - pages = [r for r in xmlreader.XmlDump("article-pear.xml").parse()] + pages = [r for r in xmlreader.XmlDump("data/article-pear.xml").parse()] self.assertEquals(1, len(pages)) self.assertNotEquals("", pages[0].comment) @@ -24,7 +24,7 @@ def pageDone(page): pages.append(page) handler.setCallback(pageDone) - xml.sax.parse("article-pear.xml", handler) + xml.sax.parse("data/article-pear.xml", handler) self.assertEquals(4, len(pages)) self.assertNotEquals("", pages[0].comment)

1 0

SVN: [6749] trunk/pywikipedia/imagerecat.py
by multichill＠svn.wikimedia.org 29 Apr '09

29 Apr '09

Revision: 6749 Author: multichill Date: 2009-04-29 13:05:37 +0000 (Wed, 29 Apr 2009) Log Message: ----------- Assign none to matches to prevent UnboundLocalError. Modified Paths: -------------- trunk/pywikipedia/imagerecat.py Modified: trunk/pywikipedia/imagerecat.py =================================================================== --- trunk/pywikipedia/imagerecat.py 2009-04-29 11:54:39 UTC (rev 6748) +++ trunk/pywikipedia/imagerecat.py 2009-04-29 13:05:37 UTC (rev 6749) @@ -111,6 +111,7 @@ commonsenseRe = re.compile('^#COMMONSENSE(.*)#USAGE(\s)+$(?P<usagenum>(\d)+)$\s(?P<usage>(.*))\s#KEYWORDS(\s)+$(?P<keywords>(\d)+)$(.*)#CATEGORIES(\s)+$(?P<catnum>(\d)+)$\s(?P<cats>(.*))\s#GALLERIES(\s)+$(?P<galnum>(\d)+)$\s(?P<gals>(.*))\s(.*)#EOF$', re.MULTILINE + re.DOTALL) gotInfo = False + matches = None maxtries = 10 tries = 0

1 0

SVN: [6748] trunk/pywikipedia/imagerecat.py
by multichill＠svn.wikimedia.org 29 Apr '09

29 Apr '09

Revision: 6748 Author: multichill Date: 2009-04-29 11:54:39 +0000 (Wed, 29 Apr 2009) Log Message: ----------- Sometimes the bot gets stuck at a certain image. Limit the number of tries to 10 so the bot won't get stuck forever at one image. Modified Paths: -------------- trunk/pywikipedia/imagerecat.py Modified: trunk/pywikipedia/imagerecat.py =================================================================== --- trunk/pywikipedia/imagerecat.py 2009-04-28 06:58:57 UTC (rev 6747) +++ trunk/pywikipedia/imagerecat.py 2009-04-29 11:54:39 UTC (rev 6748) @@ -110,19 +110,25 @@ parameters = urllib.urlencode({'i' : imagepage.titleWithoutNamespace().encode('utf-8'), 'r' : 'on', 'go-clean' : 'Find+Categories', 'p' : search_wikis, 'cl' : hint_wiki}) commonsenseRe = re.compile('^#COMMONSENSE(.*)#USAGE(\s)+$(?P<usagenum>(\d)+)$\s(?P<usage>(.*))\s#KEYWORDS(\s)+$(?P<keywords>(\d)+)$(.*)#CATEGORIES(\s)+$(?P<catnum>(\d)+)$\s(?P<cats>(.*))\s#GALLERIES(\s)+$(?P<galnum>(\d)+)$\s(?P<gals>(.*))\s(.*)#EOF$', re.MULTILINE + re.DOTALL) - gotInfo = False; - + gotInfo = False + maxtries = 10 + tries = 0 + while(not gotInfo): try: - commonsHelperPage = urllib.urlopen("http://toolserver.org/~daniel/WikiSense/CommonSense.php?%s" % parameters) - matches = commonsenseRe.search(commonsHelperPage.read().decode('utf-8')) - gotInfo = True + if ( tries < maxtries ): + tries = tries + 1 + commonsHelperPage = urllib.urlopen("http://toolserver.org/~daniel/WikiSense/CommonSense.php?%s" % parameters) + matches = commonsenseRe.search(commonsHelperPage.read().decode('utf-8')) + gotInfo = True + else: + break except IOError: wikipedia.output(u'Got an IOError, let\'s try again') except socket.timeout: wikipedia.output(u'Got a timeout, let\'s try again') - if matches: + if (matches and gotInfo): if(matches.group('usagenum') > 0): used = matches.group('usage').splitlines() for use in used:

1 0

SVN: [6747] trunk/pywikipedia/interwiki.py
by nicdumz＠svn.wikimedia.org 28 Apr '09

28 Apr '09

Revision: 6747 Author: nicdumz Date: 2009-04-28 06:58:57 +0000 (Tue, 28 Apr 2009) Log Message: ----------- Documenting the purpose, and usage of Subject. Please review this commit for English accuracy and clarity =) Modified Paths: -------------- trunk/pywikipedia/interwiki.py Modified: trunk/pywikipedia/interwiki.py =================================================================== --- trunk/pywikipedia/interwiki.py 2009-04-28 06:25:09 UTC (rev 6746) +++ trunk/pywikipedia/interwiki.py 2009-04-28 06:58:57 UTC (rev 6747) @@ -584,6 +584,59 @@ """ Class to follow the progress of a single 'subject' (i.e. a page with all its translations) + + + Subject is a transitive closure of the binary relation on Page: + "has_a_langlink_pointing_to". + + A formal way to compute that closure would be: + + With P a set of pages, NL ('NextLevel') a function on sets defined as: + NL(P) = { target | ∃ source ∈ P, target ∈ source.langlinks() } + pseudocode: + todo <- [originPage] + done <- [] + while todo != []: + pending <- todo + todo <-NL(pending) / done + done <- NL(pending) U done + return done + + + There is, however, one limitation that is induced by implementation: + to compute efficiently NL(P), one has to load the page contents of + pages in P. + (Not only the langlinks have to be parsed from each Page, but we also want + to know if the Page is a redirect, a disambiguation, etc...) + + Because of this, the pages in pending have to be preloaded. + However, because the pages in pending are likely to be in several sites + we cannot "just" preload them as a batch. + + Instead of doing "pending <- todo" at each iteration, we have to elect a + Site, and we put in pending all the pages from todo that belong to that + Site: + + Code becomes: + todo <- {originPage.site():[originPage]} + done <- [] + while todo != {}: + site <- electSite() + pending <- todo[site] + + preloadpages(site, pending) + + todo[site] <- NL(pending) / done + done <- NL(pending) U done + return done + + + Subject objects only operate on pages that should have been preloaded before. + In fact, at any time: + * todo contains new Pages that have not been loaded yet + * done contains Pages that have been loaded, and that have been treated. + * If batch preloadings are successful, Page._get() is never called from + this Object. """ def __init__(self, originPage, hints = None): @@ -683,11 +736,12 @@ """ return self.todo.siteCounts() - def willWorkOn(self, site): + def whatsNextPageBatch(self, site): """ By calling this method, you 'promise' this instance that you will - work on any todo items for the wiki indicated by 'site'. This routine - will return a list of pages that can be treated. + preload all the 'site' Pages that are in the todo list. + + This routine will return a list of pages that can be treated. """ # Bug-check: Isn't there any work still in progress? We can't work on # different sites at a time! @@ -700,6 +754,7 @@ result.append(page) self.todo.removeSite(site) + # If there are any, return them. Otherwise, nothing is in progress. return result @@ -896,10 +951,14 @@ if globalvar.hintsareright: self.hintedsites.add(page.site) - def workDone(self, counter): + def batchLoaded(self, counter): """ - This is called by a worker to tell us that the promised work - was completed as far as possible. The only argument is an instance + This is called by a worker to tell us that the promised batch of + pages was loaded. + In other words, all the pages in self.pending have already + been preloaded. + + The only argument is an instance of a counter class, that has methods minus() and plus() to keep counts of the total work todo. """ @@ -1642,7 +1701,7 @@ for subject in self.subjects: # Promise the subject that we will work on the site. # We will get a list of pages we can do. - pages = subject.willWorkOn(site) + pages = subject.whatsNextPageBatch(site) if pages: pageGroup.extend(pages) subjectGroup.append(subject) @@ -1660,7 +1719,7 @@ pass # Tell all of the subjects that the promised work is done for subject in subjectGroup: - subject.workDone(self) + subject.batchLoaded(self) return True def queryStep(self):

1 0

SVN: [6746] trunk/pywikipedia/weblinkchecker.py
by shizhao＠svn.wikimedia.org 28 Apr '09

28 Apr '09

Revision: 6746 Author: shizhao Date: 2009-04-28 06:25:09 +0000 (Tue, 28 Apr 2009) Log Message: ----------- fix "-day" is int Modified Paths: -------------- trunk/pywikipedia/weblinkchecker.py Modified: trunk/pywikipedia/weblinkchecker.py =================================================================== --- trunk/pywikipedia/weblinkchecker.py 2009-04-28 05:38:58 UTC (rev 6745) +++ trunk/pywikipedia/weblinkchecker.py 2009-04-28 06:25:09 UTC (rev 6746) @@ -806,7 +806,7 @@ HTTPignore.append(int(arg[8:])) elif arg.startswith('-day:'): global day - day = arg[5:] + day = int(arg[5:]) else: if not genFactory.handleArg(arg): singlePageTitle.append(arg)

1 0

SVN: [6745] trunk/pywikipedia/interwiki.py
by nicdumz＠svn.wikimedia.org 28 Apr '09

28 Apr '09

Revision: 6745 Author: nicdumz Date: 2009-04-28 05:38:58 +0000 (Tue, 28 Apr 2009) Log Message: ----------- Renaming global.bracketonly into global.parenthesesonly for clarity Modified Paths: -------------- trunk/pywikipedia/interwiki.py Modified: trunk/pywikipedia/interwiki.py =================================================================== --- trunk/pywikipedia/interwiki.py 2009-04-28 05:35:32 UTC (rev 6744) +++ trunk/pywikipedia/interwiki.py 2009-04-28 05:38:58 UTC (rev 6745) @@ -495,7 +495,7 @@ strictlimittwo = False needlimit = 0 ignore = [] - bracketonly = False + parenthesesonly = False rememberno = False followinterwiki = True minsubjects = config.interwiki_min_subjects @@ -1546,7 +1546,7 @@ if dictName is not None: wikipedia.output(u'Skipping: %s is an auto entry %s(%s)' % (page.title(),dictName,year)) continue - if globalvar.bracketonly: + if globalvar.parenthesesonly: # Only yield pages that have ( ) in titles if "(" not in page.title(): continue @@ -1887,7 +1887,7 @@ # override configuration config.interwiki_graph = True elif arg == '-bracket': - globalvar.bracketonly = True + globalvar.parenthesesonly = True elif arg == '-localright': globalvar.followinterwiki = False elif arg == '-hintsareright':

1 0

SVN: [6744] trunk/pywikipedia
by nicdumz＠svn.wikimedia.org 28 Apr '09

28 Apr '09

Revision: 6744 Author: nicdumz Date: 2009-04-28 05:35:32 +0000 (Tue, 28 Apr 2009) Log Message: ----------- Replacing the 'text.find(substring) >= -1' and variants by 'substring in text' 'not text.find(subs) == -1' to mean 'subs in text' in particular, is quite hard to read. (This commit is sponsored by PEP290 ^_^ ) Modified Paths: -------------- trunk/pywikipedia/censure.py trunk/pywikipedia/copyright.py trunk/pywikipedia/imagerecat.py trunk/pywikipedia/interwiki.py trunk/pywikipedia/rcsort.py trunk/pywikipedia/solve_disambiguation.py trunk/pywikipedia/spellcheck.py trunk/pywikipedia/standardize_notes.py trunk/pywikipedia/titletranslate.py trunk/pywikipedia/weblinkchecker.py trunk/pywikipedia/wikipedia.py trunk/pywikipedia/wiktionary/header.py trunk/pywikipedia/wiktionary/meaning.py trunk/pywikipedia/wiktionary/term.py trunk/pywikipedia/wiktionary/wiktionarypage.py trunk/pywikipedia/wiktionary.py Modified: trunk/pywikipedia/censure.py =================================================================== --- trunk/pywikipedia/censure.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/censure.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -89,7 +89,7 @@ report = False wordsIn = [] for badWord in ownWordList: - if text.find(' ' + badWord + ' ') != -1: + if (' ' + badWord + ' ') in text: wordsIn.append(badWord) report = True if report: Modified: trunk/pywikipedia/copyright.py =================================================================== --- trunk/pywikipedia/copyright.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/copyright.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -347,7 +347,7 @@ def check(self, url, verbose = False): for entry in self.URLlist: - if url.find(entry) != -1: + if entry in url: if verbose > 1: warn('URL Excluded: %s\nReason: %s' % (url, entry)) elif verbose: Modified: trunk/pywikipedia/imagerecat.py =================================================================== --- trunk/pywikipedia/imagerecat.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/imagerecat.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -241,7 +241,7 @@ #If cat contains the name of a country add it to the list else: for country in countries: - if not(cat.find(country)==-1): + if country in cat: listCountries.append(country) if(len(listByCountry) > 0): Modified: trunk/pywikipedia/interwiki.py =================================================================== --- trunk/pywikipedia/interwiki.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/interwiki.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -1547,7 +1547,8 @@ wikipedia.output(u'Skipping: %s is an auto entry %s(%s)' % (page.title(),dictName,year)) continue if globalvar.bracketonly: - if page.title().find("(") == -1: + # Only yield pages that have ( ) in titles + if "(" not in page.title(): continue break Modified: trunk/pywikipedia/rcsort.py =================================================================== --- trunk/pywikipedia/rcsort.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/rcsort.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -55,19 +55,19 @@ count = 0 for line in text: if rcoptions: - if line.find('gesch') > -1: + if 'gesch' in line: try: user = Ruser.search(line).group(1) except AttributeError: user = None count += 1 lines.append((user,count,line)) - elif line.find('rcoptions') > -1: + elif 'rcoptions' in line: print line.replace(mysite.path() + "?title=Speciaal:RecenteWijzigingen&","rcsort.py?") rcoptions = True - elif newbies and line.find('Nieuwste') > -1: + elif newbies and 'Nieuwste' in line: line = line.replace(mysite.path() + "?title=Speciaal:Bijdragen&","rcsort.py?").replace("target=newbies","newbies=true") - if line.find('</fieldset>') > -1: + if '</fieldset>' in line: line = line[line.find('</fieldset>')+11:] print line rcoptions = True Modified: trunk/pywikipedia/solve_disambiguation.py =================================================================== --- trunk/pywikipedia/solve_disambiguation.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/solve_disambiguation.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -393,7 +393,7 @@ # If text links to a page with title link uncapitalized, uncapitalize link, otherwise capitalize it linkupper = link.title() linklower = linkupper[0].lower() + linkupper[1:] - if text.find("[[%s]]"%linklower) > -1 or text.find("[[%s|"%linklower) > -1: + if "[[%s]]"%linklower in text or "[[%s|"%linklower in text: return linklower else: return linkupper Modified: trunk/pywikipedia/spellcheck.py =================================================================== --- trunk/pywikipedia/spellcheck.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/spellcheck.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -109,7 +109,7 @@ simwords[i] = [] for alt in knownwords.keys(): if basetext: - if alt.lower().find(basetext) == -1: + if basetext not in alt.lower() == -1: dothis = False else: dothis = True @@ -347,7 +347,7 @@ # the user if rep == self.derive(): return self.word - if self.word.find(self.derive()) == -1: + if self.derive() not in self.word: return wikipedia.input(u"Please give the result of replacing %s by %s in %s:"%(self.derive(),rep,self.word)) return self.word.replace(self.derive(),rep) Modified: trunk/pywikipedia/standardize_notes.py =================================================================== --- trunk/pywikipedia/standardize_notes.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/standardize_notes.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -245,7 +245,7 @@ skip_page = True break else: - if entry.text.find(exception) != -1: + if exception in entry.text: skip_page = True break if not skip_page: @@ -256,7 +256,7 @@ yield wikipedia.Page(mysite, entry.full_title()) break else: - if entry.text.find(old) != -1: + if old in entry.text: yield wikipedia.Page(mysite, entry.full_title()) break Modified: trunk/pywikipedia/titletranslate.py =================================================================== --- trunk/pywikipedia/titletranslate.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/titletranslate.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -21,7 +21,7 @@ site = page.site() if hints: for h in hints: - if h.find(':') == -1: + if ':' not in h: # argument given as -hint:xy where xy is a language code codes = h newname = '' Modified: trunk/pywikipedia/weblinkchecker.py =================================================================== --- trunk/pywikipedia/weblinkchecker.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/weblinkchecker.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -275,7 +275,7 @@ # the decompression for us, so we have to do it ourselves. import gzip, StringIO data = gzip.GzipFile(fileobj=StringIO.StringIO(data)).read() - if data.find("Search Results for ") != -1: + if "Search Results for " in data: return archiveURL else: return None Modified: trunk/pywikipedia/wikipedia.py =================================================================== --- trunk/pywikipedia/wikipedia.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/wikipedia.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -722,7 +722,7 @@ while not textareaFound: text = self.site().getUrl(path, sysop = sysop) - if text.find("<title>Wiki does not exist</title>") != -1: + if "<title>Wiki does not exist</title>" in text: raise NoSuchSite(u'Wiki %s does not exist yet' % self.site()) # Extract the actual text from the textarea @@ -734,13 +734,13 @@ textareaFound = True else: # search for messages with no "view source" (aren't used in new versions) - if text.find(self.site().mediawiki_message('whitelistedittitle')) != -1: + if self.site().mediawiki_message('whitelistedittitle') in text: raise NoPage(u'Page editing is forbidden for anonymous users.') - elif self.site().has_mediawiki_message('nocreatetitle') and text.find(self.site().mediawiki_message('nocreatetitle')) != -1: + elif self.site().has_mediawiki_message('nocreatetitle') and self.site().mediawiki_message('nocreatetitle') in text: raise NoPage(self.site(), self.aslink(forceInterwiki = True)) # Bad title - elif text.find('var wgPageName = "Special:Badtitle";') != -1 \ - or text.find(self.site().mediawiki_message('badtitle')) != -1: + elif 'var wgPageName = "Special:Badtitle";' in text \ + or self.site().mediawiki_message('badtitle') in text: raise BadTitle('BadTitle: %s' % self) # find out if the username or IP has been blocked elif self.site().isBlocked(): @@ -748,17 +748,17 @@ # If there is no text area and the heading is 'View Source' # but user is not blocked, the page does not exist, and is # locked - elif text.find(self.site().mediawiki_message('viewsource')) != -1: + elif self.site().mediawiki_message('viewsource') in text: raise NoPage(self.site(), self.aslink(forceInterwiki = True)) # Some of the newest versions don't have a "view source" tag for # non-existant pages # Check also the div class because if the language is not english # the bot can not seeing that the page is blocked. - elif text.find(self.site().mediawiki_message('badaccess')) != -1 or \ - text.find("<div class=\"permissions-errors\">") != -1: + elif self.site().mediawiki_message('badaccess') in text or \ + "<div class=\"permissions-errors\">" in text: raise NoPage(self.site(), self.aslink(forceInterwiki = True)) elif config.retry_on_fail: - if text.find( "<title>Wikimedia Error</title>") > -1: + if "<title>Wikimedia Error</title>" in text: output( u"Wikimedia has technical problems; will retry in %i minutes." % retry_idle_time) else: output( unicode(text) ) @@ -2966,9 +2966,9 @@ elif dt < 360: dt += 60 else: - if data.find("<title>Wiki does not exist</title>") != -1: + if "<title>Wiki does not exist</title>" in data: raise NoSuchSite(u'Wiki %s does not exist yet' % self.site) - elif data.find("<siteinfo>") == -1: # This probably means we got a 'temporary unaivalable' + elif "<siteinfo>" not in data: # This probably means we got a 'temporary unaivalable' output(u'Got incorrect export page. Sleeping for %d seconds...' % dt) time.sleep(dt) if dt <= 60: @@ -3030,7 +3030,7 @@ if m: ## output(u"%s is a redirect" % page2.aslink()) redirectto = m.group(1) - if section and redirectto.find("#") == -1: + if section and not "#" in redirectto: redirectto = redirectto+"#"+section page2._getexception = IsRedirectPage page2._redirarg = redirectto @@ -4448,7 +4448,7 @@ try: text = self.getUrl(u'%saction=query&meta=userinfo&uiprop=blockinfo' % self.api_address(), sysop=sysop) - return text.find('blockedby=') > -1 + return 'blockedby=' in text except NotImplementedError: return False Modified: trunk/pywikipedia/wiktionary/header.py =================================================================== --- trunk/pywikipedia/wiktionary/header.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/wiktionary/header.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -43,7 +43,7 @@ if line.count('=')>1: self.level = line.count('=') // 2 # integer floor division without fractional part self.header = line.replace('=','') - elif not line.find('{{')==-1: + elif '{{' in line: self.header = line.replace('{{-','').replace('-}}','') self.header = self.header.replace('{{','').replace('}}','').strip().lower() Modified: trunk/pywikipedia/wiktionary/meaning.py =================================================================== --- trunk/pywikipedia/wiktionary/meaning.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/wiktionary/meaning.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -147,13 +147,20 @@ partconsumed = True cleanpart=part.replace("'",'').lower() delim='' + # XXX The following 3 tests look wrong: + # find() returns either -1 if the substring is not found, + # or the position of the substring in the string. + # since bool(-1) = True, cleanpart.find(',') will always + # be False, unless cleanpart[0] is ',' + # + # the test "',' in cleanpart" might be the one to use. if cleanpart.find(','): delim=',' if cleanpart.find(';'): delim=';' if cleanpart.find('/'): delim='/' - if 0 <= part.find("'") <= 2 or part.find('{')!=-1: + if 0 <= part.find("'") <= 2 or '{' in part: if delim=='': delim='|' cleanpart=cleanpart+'|' @@ -181,7 +188,7 @@ if not partconsumed: # This must be our term termweareworkingon=part.replace("[",'').replace("]",'').lower() - if termweareworkingon.find('#')!=-1 and termweareworkingon.find('|')!=-1: + if '#' in termweareworkingon and '|' in termweareworkingon: termweareworkingon=termweareworkingon.split('#')[0] # Now we have enough information to create a term # object for this translation and add it to our list @@ -384,4 +391,4 @@ wrappedexamples = '' for example in self.examples: wrappedexamples = wrappedexamples + "#:'''" + example + "'''\n" - return wrappedexamples \ No newline at end of file + return wrappedexamples Modified: trunk/pywikipedia/wiktionary/term.py =================================================================== --- trunk/pywikipedia/wiktionary/term.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/wiktionary/term.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -28,17 +28,17 @@ pos=len(wikiline) maybegender=wikiline[pos:].replace("'",'').replace('{','').replace('}','').strip() self.term=wikiline[:pos].replace("[",'').replace(']','').strip() - if maybegender.find('m')!=-1: + if 'm' in maybegender: self.gender='m' - if maybegender.find('f')!=-1: + if 'f' in maybegender: self.gender='f' - if maybegender.find('n')!=-1: + if 'n' in maybegender: self.gender='n' - if maybegender.find('c')!=-1: + if 'c' in maybegender: self.gender='c' - if maybegender.find('p')!=-1: + if 'p' in maybegender: self.number=2 - if maybegender.find('dim')!=-1: + if 'dim' in maybegender: self.diminutive=True def __getitem__(self): @@ -177,8 +177,7 @@ """ Returns a string with this term as a link in a format ready for Wiktionary """ if wikilang=='en': - pos=self.term.lower().find('to ') - if pos==0: + if self.term.lower().startswith('to '): return 'to [[' + self.term[3:] + ']]' return Term.wikiWrapForList(self, wikilang) Modified: trunk/pywikipedia/wiktionary/wiktionarypage.py =================================================================== --- trunk/pywikipedia/wiktionary/wiktionarypage.py 2009-04-28 01:29:50 UTC (rev 6743) +++ trunk/pywikipedia/wiktionary/wiktionarypage.py 2009-04-28 05:35:32 UTC (rev 6744) @@ -107,15 +107,15 @@ line=line.replace('\n','').strip() # Let's start by looking for general stuff, that provides information which is # interesting to store at the page level - if line.lower().find('{wikipedia}')!=-1: + if '{wikipedia}' in line.lower(): self.addLink('wikipedia') continue - if line.lower().find('[[category:')!=-1: + if '[[category:' in line.lower(): category=line.split(':')[1].replace(']','') self.addCategory(category) # print 'category: ', category continue - if line.find('|')==-1: + if '|' not in line: bracketspos=line.find('[[') colonpos=line.find(':') if bracketspos!=-1 and colonpos!=-1 and bracketspos < colonpos: @@ -133,7 +133,7 @@ templist.append(line) continue # print 'line0:',line[0], 'line-2:',line[-2],'|','stripped line-2',line.rstrip()[-2] - if line.strip()[0]=='='and line.rstrip()[-2]=='=' or not line.find('{{-')==-1 and not line.find('-}}')==-1: + if line.strip()[0]=='='and line.rstrip()[-2]=='=' or '{{-' in line and '-}}' in line: # When a new header is encountered, it is necessary to store the information # encountered under the previous header. if templist and aheader: @@ -162,16 +162,17 @@ # Under the translations header there is quite a bit of stuff # that's only needed for formatting, we can just skip that # and go on processing the next line - if line.lower().find('{top}')!=-1: continue - if line.lower().find('{mid}')!=-1: continue - if line.lower().find('{bottom}')!=-1: continue - if line.find('|-')!=-1: continue - if line.find('{|')!=-1: continue - if line.find('|}')!=-1: continue - if line.lower().find('here-->')!=-1: continue - if line.lower().find('width=')!=-1: continue - if line.lower().find('' in lower: continue + if 'width=' in lower: continue + if '')!=-1: continue - if line.lower().find('width=')!=-1: continue - if line.lower().find('' in lower: continue + if 'width=' in lower: continue + if '<!--left column' in lower: continue + if '<!--right column' in lower: continue templist.append(line) @@ -1010,7 +1012,7 @@ if line.count('=')>1: self.level = line.count('=') // 2 # integer floor division without fractional part self.header = line.replace('=','') - elif not line.find('{{')==-1: + elif '{{' in line: self.header = line.replace('{{-','').replace('-}}','') self.header = self.header.replace('{{','').replace('}}','').strip().lower()

1 0

SVN: [6743] trunk/pywikipedia/xmlreader.py
by nicdumz＠svn.wikimedia.org 28 Apr '09

28 Apr '09

Revision: 6743 Author: nicdumz Date: 2009-04-28 01:29:50 +0000 (Tue, 28 Apr 2009) Log Message: ----------- Initializing self.comment to u'', in case mediawiki doesnt provide the field (Can happen on special:export) Modified Paths: -------------- trunk/pywikipedia/xmlreader.py Modified: trunk/pywikipedia/xmlreader.py =================================================================== --- trunk/pywikipedia/xmlreader.py 2009-04-27 16:36:47 UTC (rev 6742) +++ trunk/pywikipedia/xmlreader.py 2009-04-28 01:29:50 UTC (rev 6743) @@ -93,6 +93,7 @@ # asked for self.id = u'' self.revisionid = u'' + self.comment = u'' def setCallback(self, callback): self.callback = callback

1 0

SVN: [6742] trunk/pywikipedia/tests/test-xmlreader.py
by nicdumz＠svn.wikimedia.org 27 Apr '09

27 Apr '09

Revision: 6742 Author: nicdumz Date: 2009-04-27 16:36:47 +0000 (Mon, 27 Apr 2009) Log Message: ----------- Testing explicitely the two parsing modes of xmlreader.XmlDump: all revisions, or only the first one Modified Paths: -------------- trunk/pywikipedia/tests/test-xmlreader.py Modified: trunk/pywikipedia/tests/test-xmlreader.py =================================================================== --- trunk/pywikipedia/tests/test-xmlreader.py 2009-04-27 16:33:38 UTC (rev 6741) +++ trunk/pywikipedia/tests/test-xmlreader.py 2009-04-27 16:36:47 UTC (rev 6742) @@ -8,10 +8,16 @@ import xmlreader class XmlReaderTestCase(unittest.TestCase): - def test_XmlDump(self): + def test_XmlDumpAllRevs(self): pages = [r for r in xmlreader.XmlDump("article-pear.xml", allrevisions=True).parse()] self.assertEquals(4, len(pages)) self.assertNotEquals("", pages[0].comment) + + def test_XmlDumpFirstRev(self): + pages = [r for r in xmlreader.XmlDump("article-pear.xml").parse()] + self.assertEquals(1, len(pages)) + self.assertNotEquals("", pages[0].comment) + def test_MediaWikiXmlHandler(self): handler = xmlreader.MediaWikiXmlHandler() pages = []

1 0

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Pywikipedia-svn