[Pywikipedia-bugs] [Maniphest] [Commented On] T94688: reflinks.py doesn't save changes to articles

19 Apr 2015


      XZise added a comment.
Ah I think no need for that, because I think I know what is happening:
...
...
...
from __future__ import unicode_literals
import re
re.sub('(?is)A', '', 'Ö'.encode('latin1'))
'\xd6'
...
...
...
re.sub('(?is)A', '', 'ÖA'.encode('latin1'))
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/home/xzise/.pyenv/versions/2.7.8/lib/python2.7/re.py", line 151, in sub
      return _compile(pattern, flags).sub(repl, string, count)
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xd6 in position 0: ordinal not in range(128)
The error only appears when it actually replaces anything. In my previous examples it didn't replaced anything and it worked. But when it replaces something it tries to put the unicode into the bytes which doesn't work. You could test and verify that when you edit the line where the error happens (from your previous errors that is "core/scripts/reflinks.py" in line 647). Currently it looks like this:
linkedpagetext = self.NON_HTML.sub('', linkedpagetext)
But it should work when it looks like this:
linkedpagetext = self.NON_HTML.sub(str(''), linkedpagetext)
I need to figure out if `linkedpagetext` is also `bytes` in Python 3 but that fix will work at least in Python 2.
TASK DETAIL
  https://phabricator.wikimedia.org/T94688
REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: XZise
Cc: Ricordisamoa, jayvdb, XZise, Aklapper, Rubin16, pywikipedia-bugs

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

[Pywikipedia-bugs] [Maniphest] [Commented On] T94688: reflinks.py doesn't save changes to articles