Log Message:
Fixing the regex according to the change of HTML
Woah. That's a big change, way bigger than the summary states it.
@@ -828,6 +828,7 @@ def previousRevision(self): """Return the revision id for the previous revision of this Page.""" vh = self.getVersionHistory(revCount=2)
print vh return vh[1][0]
Forgot to remove a debug print ? :)
@@ -1154,7 +1166,7 @@ force, callback))
def put(self, newtext, comment=None, watchArticle=None, minorEdit=True,
force=False):
force=False, deleted = True):
Please document this new parameter, or rename it. As of now, two interpretations : 1) If the page was deleted, raise an error on creation 2) If the page was deleted ignore the error on creation From what I see, it's #1, but please document this ;)
On a sidenote, I think that this is a good thing to detect this, but I don't understand why the default behavior is to raise an EditConflict error. I believe it should not, (as #1 a b/c issue, and #2 because most of the time users do not care !) and we should modify one by one the scripts that could benefit from this detection, if there are any.
@@ -1297,7 +1310,7 @@ time.sleep(5) continue # A second text area means that an edit conflict has occured.
if 'id=\'wpTextbox2\' name="wpTextbox2"' in data:
if 'id=\'wpTextbox2\' name="wpTextbox2"' in data and
deleted == True:
raise EditConflict(u'An edit conflict has occured.') if self.site().has_mediawiki_message("spamprotectiontitle")\ and
self.site().mediawiki_message('spamprotectiontitle') in data:
Strange ! :)
- if 'id='wpTextbox2' name="wpTextbox2"' in data and deleted == True:
+ if 'id='wpTextbox2' name="wpTextbox2"' in data and deleted:
better, maybe ??
Nicolas Dumazet ha scritto:
Log Message:
Fixing the regex according to the change of HTML
I don't understand the following code too:
- regexp = re.compile('<li[^>]*>(?P<date>.+?)\s+<a href=.*?>(?P<user>.+?)</a>\s+(.+?</a>).*?<a href=".*?"(?P<new> class="new")? title=".*?"\s*>(?P<image>.+?)</a>(?:.*?<span class="comment">(?P<comment>.*?)</span>)?', re.UNICODE) + regexp = re.compile(r'(?:<li[^>]*>|<div class="mw-log-entry"[^>]*>)(?P<date>.+?)\s+<a href=.*?>(?P<user>.+?)</a>\s+(.+?</a>).*?<a href=".*?"(?P<new> class="new")? title=".*?"\s*>(?P<image>.+?)</a>(?:.*?<span class="comment">(?P<comment>.*?)</span>)?', re.UNICODE)
because I don't see "mw-log-entry" in MediaWiki source and online (http://commons.wikimedia.org/w/index.php?title=Special%3ALog&type=upload...). I see previous regexp work in Italian, English Wikipedia and Commons. Something escapes my mind. Please, let me know it.
pywikipedia-l@lists.wikimedia.org