[Pywikibot-commits] [Gerrit] Change title whitelist to title blacklist - change (pywikibot/core)

21 Aug 2013

Xqt has submitted this change and it was merged.
Change subject: Change title whitelist to title blacklist
......................................................................
Change title whitelist to title blacklist
Titles with characters outside the BMP [1] (>\uFFFF) are now no longer
detected as illegal. See this thread: [2]
[1] https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane
[2] http://thread.gmane.org/gmane.comp.python.pywikipediabot.general/13197/
This list of characters was generated by using the old re and by
enumerating characters:
import re
m = re.compile(u'''[^ %!"$&'()*,\-.\/0-9:;=?@A-Z\\^_`a-z~\u0080-\uFFFF+]''')
for x in range(0,0x80):
   if m.match(unichr(x)):
         print "%x" % x,
0 1 2 3 4 5 6 7 8 9 a b c d e f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 23 3c 3e 5b 5d 7b 7c 7d 7f
Change-Id: I02c26be9ad814ce11d9adf2f997d3d1e05764fd1
---
M pywikibot/page.py
1 file changed, 2 insertions(+), 2 deletions(-)
Approvals:
  Xqt: Looks good to me, approved
  jenkins-bot: Verified

diff --git a/pywikibot/page.py b/pywikibot/page.py
index e51977c..58debb7 100644
--- a/pywikibot/page.py
+++ b/pywikibot/page.py
@@ -2853,8 +2853,8 @@
"""
     illegal_titles_pattern = re.compile(
-        # Matching titles will be held as illegal.
-            u'''[^ %!"$&'()*,\-.\/0-9:;=?@A-Z\\^_`a-z~\u0080-\uFFFF+]'''
+            # Matching titles will be held as illegal.
+            ur'''[\x00-\x1f\x23\x3c\x3e\x5b\x5d\x7b\x7c\x7d\x7f]'''
             # URL percent encoding sequences interfere with the ability
             # to round-trip titles -- you can't link to them consistently.
             u'|%[0-9A-Fa-f]{2}'
-- 
To view, visit https://gerrit.wikimedia.org/r/78525
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I02c26be9ad814ce11d9adf2f997d3d1e05764fd1
Gerrit-PatchSet: 2
Gerrit-Project: pywikibot/core
Gerrit-Branch: master
Gerrit-Owner: Merlijn van Deen valhallasw@arctus.nl
Gerrit-Reviewer: Ladsgroup ladsgroup@gmail.com
Gerrit-Reviewer: Legoktm legoktm.wikipedia@gmail.com
Gerrit-Reviewer: Merlijn van Deen valhallasw@arctus.nl
Gerrit-Reviewer: Xqt info@gno.de
Gerrit-Reviewer: jenkins-bot


    

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

[Pywikibot-commits] [Gerrit] Change title whitelist to title blacklist - change (pywikibot/core)