https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
Web browser: --- Bug ID: 55186 Summary: archivebot.py doesn't support unicode month names Product: Pywikibot Version: unspecified Hardware: All OS: All Status: NEW Severity: normal Priority: Unprioritized Component: General Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: legoktm.wikipedia@gmail.com Classification: Unclassified Mobile Platform: ---
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1482/ Reported by: Anonymous user Created on: 2012-06-30 17:50:30 Subject: archivebot.py doesn't support unicode month names Original description: archivebot.py doesn't work well with languages such as Turkish which has some months with unicode characters. Namely:
2 Şubat 4 Mayıs 8 Ağustos 9 Eylül 11 Kasım 12 Aralık
https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
--- Comment #1 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Pywikipedia [http] trunk/pywikipedia (r10432, 2012/06/30, 15:47:55) Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] config-settings: use_api = True use_api_login = True unicode test: ok
https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
--- Comment #2 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Command line I used was archivebot.py -l turkish Archive/config
https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
--- Comment #3 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Could you give us a traceback or further informations about that bug? The bot uses the monthnames coming from mediaWiki messages and I don't know what is the significance of the locale setting. Could you try to run the bot without --locale=tr setting?
https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
--- Comment #4 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Sure. There is no traceback error for me to provide though since the code does work, it just ignores some threads.
Run1: archivebot.py -l turkish Archive/config Fetching template transclusions... Getting references to [[Sablon:Archive/config]] via API... Processing [[tr:Kullanici mesaj:??????]] 3 Threads found on [[tr:Kullanici mesaj:??????]] Looking for: {{Archive/config}} in [[tr:Kullanici mesaj:??????]] Processing 3 threads There are only 0 Threads. Skipping
Run2: archivebot.py Archive/config Fetching template transclusions... Getting references to [[Sablon:Archive/config]] via API... Processing [[tr:Kullanici mesaj:??????]] 3 Threads found on [[tr:Kullanici mesaj:??????]] Looking for: {{Archive/config}} in [[tr:Kullanici mesaj:??????]] Processing 3 threads There are only 0 Threads. Skipping
Note the Turkish character ı is displayed as i in the CMD window (I run code using Windows). The ???? relate to my user talk page http://tr.wikipedia.org/wiki/Kullan%C4%B1c%C4%B1%5C_mesaj:%E3%81%A8%E3%81%82... but CMD cannot display unicode.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
--- Comment #5 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Oh when I ran the bot initially without -l turkish it ignored all threads. Since it already archived 3 of the 6 initial threads it is still reporting 0 Threads as it cannot see the ones with "Mayıs" month name.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
--- Comment #6 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Looked into this a bit.
I've managed to isolate the problem to ~line 237 where all the txt2timestamp functions are. It seems that all of them are raising ValueErrors.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
--- Comment #7 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Tried this: import unicodedata
@line 237 _TM = ''.join((c for c in unicodedata.normalize('NFD', TM.group(0)) if unicodedata.category(c) != 'Mn'))
and then call txt2timestamp with _TM instead of TM.group(0)
https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
--- Comment #8 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- https://gerrit.wikimedia.org/r/#/c/84204/
https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://sourceforge.net/p/p | |ywikipediabot/bugs/1482
https://bugzilla.wikimedia.org/show_bug.cgi?id=55186
Mpaa mpaa.wiki@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED CC| |mpaa.wiki@gmail.com Resolution|--- |FIXED
--- Comment #9 from Mpaa mpaa.wiki@gmail.com --- Fixed by above patch.
pywikipedia-bugs@lists.wikimedia.org