Possibly off-topic.
Heres is a script that replace normal whitespace with one of the whitespaces supported by UTF8 ( Others are           ​  ).
I have made a few vandalization test here: http://en.wikipedia.org/wiki/User:Tei/lalaland
What do you guys think? could this be a problem? You can break links like [[Mr Thonson]] replacing it by [[Mr Thonson]]
while(<DATA>){ @chars = split(//,$_);
foreach $ch (@chars){ if ( $ch eq " "){ print pack("ccc",0xe2,0x80,0x80); }else { print $ch; } } }
__DATA__ Text to be vandalized goes here
On Thu, Oct 2, 2008 at 6:33 PM, Tei oscar.vives@gmail.com wrote:
Heres is a script that replace normal whitespace with one of the whitespaces supported by UTF8 ( Others are           ​  ).
I have made a few vandalization test here: http://en.wikipedia.org/wiki/User:Tei/lalaland
What do you guys think? could this be a problem? You can break links like [[Mr Thonson]] replacing it by [[Mr Thonson]]
We don't want to ban all Unicode whitespace. Some of it is useful, which is why it's in Unicode. :) For the specific case of titles, see bug 1414:
Comment #3 From Brion Vibber 2005-04-25 06:10:15 UTC -------
It might make sense to explicitly disallow the Zl and Zp chars (line separator and paragraph separator), and normalize all the Zs chars to spaces (well, underscores) in title processing.
At first glance this seems like a trivial and uncontroversial change, so I'm curious why it wasn't done 3 1/2 years ago.
On the other hand some browsers apparently convert esoteric whitespace literals back to \u0020 in the <textarea> anyway whether the original change was malicious or not.
http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_42#P...
-_-
—C.W.
On Friday 03 October 2008 02:23:37 Aryeh Gregor wrote:
On Thu, Oct 2, 2008 at 6:33 PM, Tei oscar.vives@gmail.com wrote:
Heres is a script that replace normal whitespace with one of the whitespaces supported by UTF8 ( Others are           ​ ;  ).
I have made a few vandalization test here: http://en.wikipedia.org/wiki/User:Tei/lalaland
What do you guys think? could this be a problem? You can break links like [[Mr Thonson]] replacing it by [[Mr Thonson]]
We don't want to ban all Unicode whitespace. Some of it is useful, which is why it's in Unicode. :) For the specific case of titles,
Thinking a bit about it, why not? Upon saving, convert all spaces to the ASCII space. If someone legitimately needs another space, he can and should use HTML entity. Someone who uses another space simply creates confusion for other editors who have no way to differ it from the ordinary space.
On 04.10.2008, 23:27 Nikola wrote:
On Friday 03 October 2008 02:23:37 Aryeh Gregor wrote:
On Thu, Oct 2, 2008 at 6:33 PM, Tei oscar.vives@gmail.com wrote:
Heres is a script that replace normal whitespace with one of the whitespaces supported by UTF8 ( Others are           ​ ;  ).
I have made a few vandalization test here: http://en.wikipedia.org/wiki/User:Tei/lalaland
What do you guys think? could this be a problem? You can break links like [[Mr Thonson]] replacing it by [[Mr?Thonson]]
We don't want to ban all Unicode whitespace. Some of it is useful, which is why it's in Unicode. :) For the specific case of titles,
Thinking a bit about it, why not? Upon saving, convert all spaces to the ASCII space. If someone legitimately needs another space, he can and should use HTML entity. Someone who uses another space simply creates confusion for other editors who have no way to differ it from the ordinary space.
Just not nbsp - it's widely used on Russian Wikipedia, and no-one wants to replace it with an entity.
On Sat, Oct 4, 2008 at 3:27 PM, Nikola Smolenski smolensk@eunet.yu wrote:
On Friday 03 October 2008 02:23:37 Aryeh Gregor wrote:
On Thu, Oct 2, 2008 at 6:33 PM, Tei oscar.vives@gmail.com wrote:
Heres is a script that replace normal whitespace with one of the whitespaces supported by UTF8 ( Others are           ​ ;  ).
I have made a few vandalization test here: http://en.wikipedia.org/wiki/User:Tei/lalaland
What do you guys think? could this be a problem? You can break links like [[Mr Thonson]] replacing it by [[Mr Thonson]]
We don't want to ban all Unicode whitespace. Some of it is useful, which is why it's in Unicode. :) For the specific case of titles,
Thinking a bit about it, why not? Upon saving, convert all spaces to the ASCII space. If someone legitimately needs another space, he can and should use HTML entity. Someone who uses another space simply creates confusion for other editors who have no way to differ it from the ordinary space.
You could say that about a lot of Unicode characters. "it simply create confusion" "should use the HTML entity".
My keyboard mapping types the non-breaking space just fine (I press greek-space) and I find it pretty useful.
If you were going to do any conversion, I'd suggest it be TO the correct HTML entity. But I think it would far better to not convert at all and instead give the editing and diff views some kind ability to colorize interesting characters.
Aryeh Gregor wrote:
On Thu, Oct 2, 2008 at 6:33 PM, Tei oscar.vives@gmail.com wrote:
Heres is a script that replace normal whitespace with one of the whitespaces supported by UTF8 ( Others are           ​  ).
I have made a few vandalization test here: http://en.wikipedia.org/wiki/User:Tei/lalaland
What do you guys think? could this be a problem? You can break links like [[Mr Thonson]] replacing it by [[Mr Thonson]]
We don't want to ban all Unicode whitespace. Some of it is useful, which is why it's in Unicode. :) For the specific case of titles, see bug 1414:
On the English Wikipedia we've (actually, I did) set the TitleBlacklist extension to block those. We also block the bidirectional override characters which can be even more problematic. (Nothing as fun as an invisible character that makes all following text render right to left.)
http://en.wikipedia.org/wiki/MediaWiki:Titleblacklist
What about the direction-reverse stuff in the the texts? Okay, these are probably needed in Arab/Hebrew wikis, but can't they be a bit confusing in the article sources?
Marco 2008/10/5 Ilmari Karonen nospam@vyznev.net
Aryeh Gregor wrote:
On Thu, Oct 2, 2008 at 6:33 PM, Tei oscar.vives@gmail.com wrote:
Heres is a script that replace normal whitespace with one of the whitespaces supported by UTF8 ( Others are
          ​  ).
I have made a few vandalization test here: http://en.wikipedia.org/wiki/User:Tei/lalaland
What do you guys think? could this be a problem? You can break links like [[Mr Thonson]] replacing it by [[Mr Thonson]]
We don't want to ban all Unicode whitespace. Some of it is useful, which is why it's in Unicode. :) For the specific case of titles, see bug 1414:
On the English Wikipedia we've (actually, I did) set the TitleBlacklist extension to block those. We also block the bidirectional override characters which can be even more problematic. (Nothing as fun as an invisible character that makes all following text render right to left.)
http://en.wikipedia.org/wiki/MediaWiki:Titleblacklist
-- Ilmari Karonen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Sun, Oct 5, 2008 at 1:54 AM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
What about the direction-reverse stuff in the the texts? Okay, these are probably needed in Arab/Hebrew wikis, but can't they be a bit confusing in the article sources?
Not as confusing, in my experience, as having punctuation marks and things in totally the wrong places when you're editing in the text box. (But RTL editors who are less tech-savvy would quite likely disagree with that, I'm guessing: I wouldn't be at all surprised if they were specifically banned on the RTL wikis.)
Aryeh Gregor wrote:
On Sun, Oct 5, 2008 at 1:54 AM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
What about the direction-reverse stuff in the the texts? Okay, these are probably needed in Arab/Hebrew wikis, but can't they be a bit confusing in the article sources?
Not as confusing, in my experience, as having punctuation marks and things in totally the wrong places when you're editing in the text box. (But RTL editors who are less tech-savvy would quite likely disagree with that, I'm guessing: I wouldn't be at all surprised if they were specifically banned on the RTL wikis.)
Yes, and page text doesn't (usually) end up in places like logs and recent changes. See for example (warning, ugly URL follows):
http://en.wikipedia.org/w/index.php?title=%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%...
Ilmari Karonen wrote:
Aryeh Gregor wrote:
On Sun, Oct 5, 2008 at 1:54 AM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
What about the direction-reverse stuff in the the texts? Okay, these are probably needed in Arab/Hebrew wikis, but can't they be a bit confusing in the article sources?
Not as confusing, in my experience, as having punctuation marks and things in totally the wrong places when you're editing in the text box. (But RTL editors who are less tech-savvy would quite likely disagree with that, I'm guessing: I wouldn't be at all surprised if they were specifically banned on the RTL wikis.)
Yes, and page text doesn't (usually) end up in places like logs and recent changes. See for example (warning, ugly URL follows):
Actually, make that:
http://en.wikipedia.org/wiki/%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%E...
so it works right for non-admins too. Sorry.
On Mon, Oct 6, 2008 at 10:00 PM, Ilmari Karonen nospam@vyznev.net wrote:
Ilmari Karonen wrote:
Aryeh Gregor wrote:
On Sun, Oct 5, 2008 at 1:54 AM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
What about the direction-reverse stuff in the the texts? Okay, these are probably needed in Arab/Hebrew wikis, but can't they be
a
bit confusing in the article sources?
Not as confusing, in my experience, as having punctuation marks and things in totally the wrong places when you're editing in the text box. (But RTL editors who are less tech-savvy would quite likely disagree with that, I'm guessing: I wouldn't be at all surprised if they were specifically banned on the RTL wikis.)
Yes, and page text doesn't (usually) end up in places like logs and recent changes. See for example (warning, ugly URL follows):
Actually, make that:
http://en.wikipedia.org/wiki/%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%E...
so it works right for non-admins too. Sorry.
I have created a online hex viewer, that may prove handy to seek into binary problems on our texts.
http://zerror.com/bin/wex/?url=http://en.wikipedia.org/wiki/MediaWiki:Titleb...
-- ℱin del ℳensaje.
On 10/7/08, Tei oscar.vives@gmail.com wrote:
I have created a online hex viewer, that may prove handy to seek into binary problems on our texts.
http://zerror.com/bin/wex/?url=http://en.wikipedia.org/wiki/MediaWiki:Titleb...
Just curious, can you limit this to [[homoglyph]]ic characters and make a javascript gadget for it?
—C.W.
Marco Schuster wrote:
What about the direction-reverse stuff in the the texts? Okay, these are probably needed in Arab/Hebrew wikis, but can't they be a bit confusing in the article sources?
Probably, but on Latin-script Wikipedias they're sometimes necessary to keep the direction auto-detection from screwing up. For example, I tried to do this once:
'''Person''' (Arabic: [arabic script here]; 1935-1970)
Since the only thing immediately after the Arabic script is a semicolon and digits, they're interpreted as part of the r-to-l text block, so it's rendered something like:
Person (Arabic: 1935-1970 ;[arabic script here)
which is clearly not what was intended. =] The other workaround is to gratuitiously add in some Latin script characters we wouldn't usually use, like "b. 1935; d. 1970".
-Mark
wikitech-l@lists.wikimedia.org