Bugs item #1894621, was opened at 2008-02-15 21:09 Message generated for change (Comment added) made by xqt You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1894621...
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None
Status: Closed Resolution: Fixed
Priority: 5 Private: No Submitted By: NicDumZ — Nicolas Dumazet (nicdumz) Assigned to: Nobody/Anonymous (nobody) Summary: interwiki.py wp.Error "Invalid title '' "
Initial Comment: r5030 :
interwiki.py -autonomous -start:"Parti whig"
Stack :
Sleeping for 5.0 seconds, 2008-02-15 20:32:59 NOTE: [[Particule (grammaire)]]: [[fr:Mot-outil]] gives duplicate interwiki on same site [[de:Synsemantikum]] NOTE: [[Particule (grammaire)]]: [[fr:Mot-outil]] gives duplicate interwiki on same site [[br:Ger goullo]] Getting 60 pages from wikipedia:en... [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[ro:Restauraţia franceză]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[fa:بازگشت بوربونها به سلطنت فرانسه]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[he:הרסטורציה]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[no:Restaurasjonen i Frankrike]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[es:Restauración Francesa]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[ja:フランス復古王政]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[nl:Restauratie (Frankrijk)]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[sv:Bourbonska restaurationen]] NOTE: [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives duplicate interwiki on same site [[fr:Restauration française]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[de:Restauration (Frankreich)]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[it:Restaurazione]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[sr:Бурбонска рестаурација]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[ru:Реставрация Бурбонов]] [[Partis politiques sous la Restauration]]: [[en:Bourbon Restoration]] gives new interwiki [[tr:Restorasyon (Fransa)]] [[Partition d'un entier]]: [[en:Partition (number theory)]] gives new interwiki [[zh:整數分拆]] [[Partition d'un entier]]: [[en:Partition (number theory)]] gives new interwiki [[he:פונקציית החלוקה (תורת המספרים)]] [[Partition d'un entier]]: [[en:Partition (number theory)]] gives new interwiki [[ja:整数分割]] [[Partition d'un entier]]: [[en:Partition (number theory)]] gives new interwiki [[sv:Partitionsfunktionen]] NOTE: [[Partition d'un entier]]: [[en:Partition (number theory)]] gives duplicate interwiki on same site [[fr:Partage d'un entier]] [[Partition d'un entier]]: [[en:Partition (number theory)]] gives new interwiki [[de:Partitionsfunktion]] [[Partition d'un entier]]: [[en:Partition (number theory)]] gives new interwiki [[it:Partizione di un intero]] [[Partition d'un entier]]: [[en:Partition (number theory)]] gives new interwiki [[ru:Разбиение числа]] Dump fr (wikipedia) saved Traceback (most recent call last): File "interwiki.py", line 1644, in <module> bot.run() File "interwiki.py", line 1408, in run self.queryStep() File "interwiki.py", line 1382, in queryStep self.oneQuery() File "interwiki.py", line 1378, in oneQuery subject.workDone(self) File "interwiki.py", line 679, in workDone redirectTargetPage = wikipedia.Page(page.site(), arg.args[0]) File "/home/nico/projets/pywikipedia/wikipedia.py", line 346, in __init__ raise Error(u"Invalid title '%s'" % title ) wikipedia.Error: Invalid title ''
Cheers !
----------------------------------------------------------------------
Comment By: xqt (xqt)
Date: 2009-11-15 21:28
Message: Might be fixed in r6801
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2008-02-18 13:45
Message: Logged In: YES user_id=1963242 Originator: YES
My patch apparently solves the first issue, but I just raised again the same error, working on a dump :
python redirect.py double -xml:/media/hda5/frwiki-20080216-pages-articles.xml
Checked for running processes. 2 processes currently running, including the current process. Reading XML dump... 10000 pages read... 20000 pages read... 30000 pages read... 40000 pages read... 50000 pages read... 60000 pages read... 70000 pages read... 80000 pages read... 90000 pages read... 100000 pages read... 110000 pages read... 120000 pages read... 130000 pages read... Traceback (most recent call last): File "redirect.py", line 398, in <module> main() File "redirect.py", line 394, in main bot.run() File "redirect.py", line 349, in run self.fix_double_redirects() File "redirect.py", line 260, in fix_double_redirects for redir_name in self.generator.retrieve_double_redirects(): File "redirect.py", line 204, in retrieve_double_redirects dict = self.get_redirects_from_dump() File "redirect.py", line 128, in get_redirects_from_dump if wikipedia.Page(site, entry.title).namespace() not in self.namespaces: File "/home/nico/projets/pywikipedia/wikipedia.py", line 346, in __init__ raise Error(u"Invalid title '%s'" % title )
This came from a very particular page, entitled " " (a non-breaking space) : http://fr.wikipedia.org/w/index.php?title=%C2%A0&redirect=no
I'm thinking of using strip(" ") instead of strip(). I tried, and it works for me now.
Index: wikipedia.py =================================================================== --- wikipedia.py (révision 5044) +++ wikipedia.py (copie de travail) @@ -332,7 +332,9 @@ while u" " in t: t = t.replace(u" ", u" ") # Strip spaces at both ends - t = t.strip() + # strip(" ") *is* different of strip() because strip() + # also removes non breaking spaces + t = t.strip(" ") # Remove left-to-right and right-to-left markers. t = t.replace(u'\u200e', '').replace(u'\u200f', '') # leading colon implies main namespace instead of the default @@ -627,6 +629,9 @@ self._getexception = NoPage raise except IsRedirectPage, arg: + if not arg[0]: + output(u"WARNING: %s contains an empty redirect tag, ignoring it" % self.aslink()) + pass self._getexception = IsRedirectPage self._redirarg = arg if not get_redirect and not nofollow_redirects:
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2008-02-16 12:06
Message: Logged In: YES user_id=1963242 Originator: YES
This simple patch will certainly solve the issue :
Index: wikipedia.py =================================================================== --- wikipedia.py (révision 5036) +++ wikipedia.py (copie de travail) @@ -627,8 +627,11 @@ self._getexception = NoPage raise except IsRedirectPage, arg: + if not arg[0]: + output(u"WARNING: %s contains an empty redirect tag, ignoring it" % self.aslink()) + pass self._getexception = IsRedirectPage self._redirarg = arg if not get_redirect and not nofollow_redirects: raise except SectionError:
(I don't think that modifying the redirectRegex would be a good idea, since it would not allow us to remove an empty redirect using that Regex)
Also, per http://fr.wikipedia.org/wiki/Utilisateur:DumZiBoT/Temp, pages such as : #REDIRECT [[]] #REDIRECT [[Page]] Are not considered by mediawiki as a redirect page, so it's OK to ignore the first redirect :)
Cheers !
----------------------------------------------------------------------
Comment By: NicDumZ — Nicolas Dumazet (nicdumz) Date: 2008-02-15 21:51
Message: Logged In: YES user_id=1963242 Originator: YES
Actually, this was caused by an empty redirect tag (#REDIRECT [[]]) inserted in that diff : http://en.wikipedia.org/w/index.php?title=Louisiana_Waterthrush&diff=190...
----------------------------------------------------------------------
You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1894621...