Seeing as 1.5 is UTF-8 for all languages (yay), can we parse -- and ---
to N and M dashes? I know there were some problems with wiki table
syntax last time this was attempted, but that's easy to eliminate if you
require spaces around the hyphen-sequence. In that case, as a rule, the
resulting display would have spaces as well (except numbers... more
later). The spaces around dashes are a mini-dispute of their own over at
[[Wikipedia talk:Manual of Style (dashes)]], but I think that people
would be happy just to have an easy, standard way to enter dashes.
Last time this came up someone mentioned other languages that might
forbid a space around dashes. I know that in French it's common to use
dashes at the beginning lines in dialog. This problem has an easy
solution: French keyboards also have a a dash key, unless my memory is
failing me completely. American Mac keyboards have option-hyphen. On
Windows, Word inserts dashes all over the place. People who aren't happy
with the " --- " and " -- " solution would be free to enter unicode
dashes however they wish, as they can today on UTF-8 wikipedias.
(Stubborn space-conscious anglophones might also resort to this
method... so be it.)
I've written a patch that I think is fairly well placed since it's
adjacent to the existing code that inserts non-breaking spaces between
guillemets. This method would make a lot of people happy, and it
promotes compliance to the Manual of Style as much as is possible.
Here's how it works:
1. Replace any ' -- ' with the UTF-8 sequence equivalent to ' – '
2. Replace any '--' between numbers with '–' alone.
3. Replace any ' --- ' with the UTF-8 sequence equivalent to
' — '
See below for the code.
Nathan
Index: Parser.php
===================================================================
RCS file: /cvsroot/wikipedia/phase3/includes/Parser.php,v
retrieving revision 1.383
diff --unified=3 -w -B -r1.383 Parser.php
--- Parser.php 6 Feb 2005 16:13:05 -0000 1.383
+++ Parser.php 9 Feb 2005 21:15:50 -0000
@@ -185,6 +185,9 @@
'/<br *>/i' => '<br />',
'/<center *>/i' => '<div class="center">',
'/<\\/center *>/i' => '</div>',
+ '/ -- /i' => "\xC2\xA0\xE2\x80\x93 ", # –<normal space>
+ '/([0-9])--([0-9])/i' => "\\1\xE2\x80\x93\\2", # –
+ '/ --- /i' => "\xC2\xA0\xE2\x80\x94 " # —<normal space>
);
$text = preg_replace( array_keys($fixtags), array_values($fixtags),
$text );
$text = Sanitizer::normalizeCharReferences( $text );
@@ -195,7 +198,10 @@
# french spaces, Guillemet-right
'/(\\302\\253) /i' => '\\1 ',
'/<center *>/i' => '<div class="center">',
- '/<\\/center *>/i' => '</div>'
+ '/<\\/center *>/i' => '</div>',
+ '/ -- /i' => "\xC2\xA0\xE2\x80\x93 ", # –<normal space>
+ '/([0-9])--([0-9])/i' => "\\1\xE2\x80\x93\\2", # –
+ '/ --- /i' => "\xC2\xA0\xE2\x80\x94 " # —<normal space>
);
$text = preg_replace( array_keys($fixtags), array_values($fixtags),
$text );
}