Apologies if this has been brought up before, but I am curious about the status of automatic replacement of dashes in the wikitext (in particular -- and ---) with HTML-entity versions that many users seem to like so much (ndash and mdash). I noticed that a while back (February 2004) such a conversion was implemented, but apparently reverted due to undesirable breakage. Any way it could be reinstated? I may look into it once I get familiar with the code.
Also, there's some ongoing disagreement about "correct" quote usage: several users like to replace standard ASCII apostrophes with their lsquo/rsquo HTML counterparts, which seriously uglifies the wikitext. The same is true of normal double-quotes (ldquo/rdquo). I realize it may be quite difficult to implement some form of quote-matching that could automatically convert these from ASCII, especially given our other uses for single-quotes; would it be feasible to use multiple ASCII double-quotes to clue in the parser about desired behavior? e.g., "foo" for normal ASCII double-quotes, ""foo"" for an HTML lsquo/rsquo, and """foo""" for ldquo/rdquo. I can see where this might run into some problems in situations involving nested quotation, but seems adequate for the majority of quoting.
Given the regularity of conflict over this issue (not to mention some users, who shall remain nameless, on a mission to replace all occurrences of what they perceive as bad quoting, wikitext readability be damned), it seems like something worth implementing. I like correct and nicely-readable HTML-entity-style quotes as much as the next guy, as long as it doesn't hamper editing. It's frustrating to see so much heated debate over a matter that could, at least in theory, be solved by such an improvement.
Eric
===== Do you know stuff? Contribute to Wikipedia! http://www.wikipedia.org/ Help preserve the public domain by becoming a Distributed Proofreader. http://www.pgdp.net/
__________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail
Eric Pierce wrote:
Also, there's some ongoing disagreement about "correct" quote usage: several users like to replace standard ASCII apostrophes with their lsquo/rsquo HTML counterparts, which seriously uglifies the wikitext.
Of course this has been discussed before, but the reason nothing has been done about it is that the issue almost automatically solves itself when the English Wikipedia (and all other remaining Latin-1 wikis) finally switches to UTF-8. Once that is done, we can introduce some code that will replace all HTML entities with their real Unicode character upon saving. Then people can use — and ‘ as much as they want; they will get the character they want, but they won't uglify the wikitext. Thus, introducing an extra wiki syntax like ""this"" now would be quiute redundant later.
The reason the patch for "--" was reverted is that some people apparently like to use several dashes in the wiki syntax for table rows, and this broke it. Myself, I don't really get it; one dash is way enough...
Timwi
On Sat, 07 Aug 2004 at 04:57 +0000, Timwi wrote:
Eric Pierce wrote:
Also, there's some ongoing disagreement about "correct" quote usage: several users like to replace standard ASCII apostrophes with their lsquo/rsquo HTML counterparts, which seriously uglifies the wikitext.
Of course this has been discussed before, but the reason nothing has been done about it is that the issue almost automatically solves itself when the English Wikipedia (and all other remaining Latin-1 wikis) finally switches to UTF-8. Once that is done, we can introduce some code that will replace all HTML entities with their real Unicode character upon saving. Then people can use — and ‘ as much as they want; they will get the character they want, but they won't uglify the wikitext. Thus, introducing an extra wiki syntax like ""this"" now would be quiute redundant later.
The reason the patch for "--" was reverted is that some people apparently like to use several dashes in the wiki syntax for table rows, and this broke it. Myself, I don't really get it; one dash is way enough...
if you mean :
{| |-----
it comes from the python bot, then people learn this syntax from the numerous example in code. I used this syntax a few month before noticing it wasn't necessary. It improves a bit readability of table too.
phe
Eric Pierce wrote:
Apologies if this has been brought up before, but I am curious about the status of automatic replacement of dashes in the wikitext (in particular -- and ---) with HTML-entity versions that many users seem to like so much (ndash and mdash). I noticed that a while back (February 2004) such a conversion was implemented, but apparently reverted due to undesirable breakage. Any way it could be reinstated? I may look into it once I get familiar with the code.
That conversion consisted of two lines of code:
$text = str_replace( "---", "—", $text ); $text = str_replace( "--", "–", $text );
These were inserted near the start of the wikitext->HTML transformation. The biggest problem in my mind was that it breaks links. Perhaps the way to avoid this would be to act on HTML instead of wikitext, and to only replace dashes which occur outside HTML tags.
The current status is that I have other things to do and nobody else has volunteered.
-- Tim Starling
On Saturday 07 August 2004 08:10, Tim Starling wrote:
Eric Pierce wrote:
Apologies if this has been brought up before, but I am curious about the status of automatic replacement of dashes in the wikitext (in particular -- and ---) with HTML-entity versions that many users seem to like so much (ndash and mdash). I noticed that a while back (February 2004) such a conversion was implemented, but apparently reverted due to undesirable breakage. Any way it could be reinstated? I may look into it once I get familiar with the code.
That conversion consisted of two lines of code:
$text = str_replace( "---", "—", $text ); $text = str_replace( "--", "–", $text );
These were inserted near the start of the wikitext->HTML transformation. The biggest problem in my mind was that it breaks links. Perhaps the way to avoid this would be to act on HTML instead of wikitext, and to only replace dashes which occur outside HTML tags.
How about only doing this replacement if the dashes are immediately preceded and followed by whitespace/newline? Simple rule, easy to understand, easy to implement, doesn't break URLs.
-- Jan Hidders
Jan Hidders wrote:
How about only doing this replacement if the dashes are immediately preceded and followed by whitespace/newline? Simple rule, easy to understand, easy to implement, doesn't break URLs.
Yes, simple rule, but one that would probably discriminate against many languages.
Timwi
Timwi wrote:
Yes, simple rule, but one that would probably discriminate against many languages.
Timwi
Sorry, to bother you, but could you enlighten me on how it would discriminate against some languages? This is a chance for me to acquire some appreciation on how different some languages are.
-John
John Ky wrote:
Timwi wrote:
Yes, simple rule, but one that would probably discriminate against many languages.
Sorry, to bother you, but could you enlighten me on how it would discriminate against some languages? This is a chance for me to acquire some appreciation on how different some languages are.
I don't know that many languages myself; I'm only being as open-minded as possible. There is no grounds upon which to base the assumption that dashes must necessarily be surrounded by spaces in all languages. Even in English, many people don't put spaces around m-dashes.
Timwi
On Tue, 10 Aug 2004 02:29:46 +0100, Timwi timwi@gmx.net wrote:
I don't know that many languages myself; I'm only being as open-minded as possible. There is no grounds upon which to base the assumption that dashes must necessarily be surrounded by spaces in all languages. Even in English, many people don't put spaces around m-dashes.
IIRC, the Manual of Style even says that date intervals should NOT have surrounding whitespace when using m-dashes.
-Bill Clark
wikitech-l@lists.wikimedia.org