Hi, Many must have informally requested it: would it be possible and desirable to have -- be interpreted as em-dash (—)? It's a very useful symbol, particularly in the context of writing an encyclopaedia, but is not readily available on the keyboard. Many editors already seem to naturally use -- to try and imitate a proper em-dash.
What are the issues involved? Anyone know of other uses of -- that should not be interpreted this way? Is there any argument that it should become some other kind of dash instead?
Also, I realise there are bots that occasionally go around replacing -- with other dashes, but that does seem a bit kludgy.
In the meantime, on en I have created {{--}} which surprisingly no one had thought of creating.
Steve
Steve Bennett wrote:
Hi, Many must have informally requested it: would it be possible and desirable to have -- be interpreted as em-dash (—)? It's a very useful symbol, particularly in the context of writing an encyclopaedia, but is not readily available on the keyboard. Many editors already seem to naturally use -- to try and imitate a proper em-dash.
What are the issues involved? Anyone know of other uses of -- that should not be interpreted this way? Is there any argument that it should become some other kind of dash instead?
Also, I realise there are bots that occasionally go around replacing -- with other dashes, but that does seem a bit kludgy.
In the meantime, on en I have created {{--}} which surprisingly no one had thought of creating.
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
The only example I can think of a use of -- that should not map to emdash is in programming languages like C, where x-- is used to mean "take the value of x and then decrement it"
-- Neil
On 6/22/06, Neil Harris neil@tonal.clara.co.uk wrote:
The only example I can think of a use of -- that should not map to emdash is in programming languages like C, where x-- is used to mean "take the value of x and then decrement it"
Heh. Actually in that case, it's likely to be indented with a space, which effectively does <nowiki> anyway. Other uses?
Steve
Steve Bennett wrote:
On 6/22/06, Neil Harris neil@tonal.clara.co.uk wrote:
The only example I can think of a use of -- that should not map to emdash is in programming languages like C, where x-- is used to mean "take the value of x and then decrement it"
Heh. Actually in that case, it's likely to be indented with a space, which effectively does <nowiki> anyway. Other uses?
That is not true. Space-indented text is still subject to some parsing, e.g. ''' for bold.
On 6/22/06, Timwi timwi@gmx.net wrote:
That is not true. Space-indented text is still subject to some parsing, e.g. ''' for bold.
Let's be honest here: space indented text is *weird*. What's the historical context behind it? Why is it monospaced, coloured, surrounded with a border, and why doesn't it wrap? It's such an odd layout choice for something so easy to do by accident.
(it would all sort of make sense except for the lack of wrapping...maybe)
Steve
Steve Bennett wrote:
On 6/22/06, Timwi timwi@gmx.net wrote:
That is not true. Space-indented text is still subject to some parsing, e.g. ''' for bold.
Let's be honest here: space indented text is *weird*. What's the historical context behind it? Why is it monospaced, coloured, surrounded with a border, and why doesn't it wrap? It's such an odd layout choice for something so easy to do by accident.
(it would all sort of make sense except for the lack of wrapping...maybe)
This "weirdness" comes from the fact that this particular wiki markup is not really defined in terms of what visible output it produces, but only in terms of what HTML it produces -- namely <pre>. (Note that this is not the same as wiki <pre>.) The reason HTML <pre> disables word-wrapping is so that you can do things like this:
http://en.wikipedia.org/wiki/Family_tree_of_the_Greek_gods
Imagine this would word-wrap. It would destroy the whole thing :)
Timwi
On 6/23/06, Timwi timwi@gmx.net wrote:
This "weirdness" comes from the fact that this particular wiki markup is not really defined in terms of what visible output it produces, but only in terms of what HTML it produces -- namely <pre>. (Note that this is not the same as wiki <pre>.) The reason HTML <pre> disables word-wrapping is so that you can do things like this:
http://en.wikipedia.org/wiki/Family_tree_of_the_Greek_gods
Imagine this would word-wrap. It would destroy the whole thing :)
Ok, so indented space is hardcoded as meaning <PRE> and <PRE> doesn't line wrap.
Pity. Even if indented space just guaranteed <TT> or something (which does line wrap?) then you could always turn off line wrapping by some other means on the rare cases (like that one) where it was really wanted.
Steve
Steve Bennett schrieb:
Hi, Many must have informally requested it: would it be possible and desirable to have -- be interpreted as em-dash (—)? It's a very useful symbol, particularly in the context of writing an encyclopaedia, but is not readily available on the keyboard. Many editors already seem to naturally use -- to try and imitate a proper em-dash.
I filed a bug and attached a patch for that: http://bugzilla.wikimedia.org/show_bug.cgi?id=6402
Leon
On 6/22/06, Steve Bennett stevage@gmail.com wrote:
Hi, Many must have informally requested it: would it be possible and desirable to have -- be interpreted as em-dash (—)? It's a very useful symbol, particularly in the context of writing an encyclopaedia, but is not readily available on the keyboard. Many editors already seem to naturally use -- to try and imitate a proper em-dash.
What are the issues involved? Anyone know of other uses of -- that should not be interpreted this way? Is there any argument that it should become some other kind of dash instead?
Also, I realise there are bots that occasionally go around replacing -- with other dashes, but that does seem a bit kludgy.
In the meantime, on en I have created {{--}} which surprisingly no one had thought of creating.
This is something that was in the 1.5 development tree before release but pulled out just before release due to issues with doing dumb -- to emdash replacement in the not-a-parser. One case I remember it breaking on was image names which contained --, links around -- ([[foo|bar]]--[[bar|foo]] didn't work etc.)
On 6/22/06, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:
This is something that was in the 1.5 development tree before release but pulled out just before release due to issues with doing dumb -- to emdash replacement in the not-a-parser. One case I remember it breaking on was image names which contained --, links around -- ([[foo|bar]]--[[bar|foo]] didn't work etc.)
OIC, the substition was taking place too early? Yeah, you wouldn't want {{--}} to be treated as if you'd typed {{—}}...
So is it permanently dead?
Steve
On 6/22/06, Steve Bennett stevage@gmail.com wrote:
On 6/22/06, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:
This is something that was in the 1.5 development tree before release but pulled out just before release due to issues with doing dumb -- to emdash replacement in the not-a-parser. One case I remember it breaking on was image names which contained --, links around -- ([[foo|bar]]--[[bar|foo]] didn't work etc.)
OIC, the substition was taking place too early? Yeah, you wouldn't want {{--}} to be treated as if you'd typed {{—}}...
So is it permanently dead?
No, like I said the currently availible implementation has flaws, and there might also be other issues to work out besides that, such as locale issues. But I don't think anyone (Brion for one doesn't, IIRC) objects to the idea of having a working implementation of that in the parser.
On 6/22/06, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:
So is it permanently dead?
No, like I said the currently availible implementation has flaws, and there might also be other issues to work out besides that, such as locale issues. But I don't think anyone (Brion for one doesn't, IIRC) objects to the idea of having a working implementation of that in the parser.
Ok, great! Out of curiosity, how many developers are that implement "general" features like this? It's often difficult to get any kind of picture of the process followed between "hey I've got a great idea" and a line in some release notes some time later.
Steve
On 22/06/06, Steve Bennett stevage@gmail.com wrote:
Ok, great! Out of curiosity, how many developers are that implement "general" features like this? It's often difficult to get any kind of picture of the process followed between "hey I've got a great idea" and a line in some release notes some time later.
There are ~ 10 active committers at present, 2 of of which are SoC committers. The fairly regular updaters are Brion, Tim, myself, Niklas Laxstrom (i18n)...Rotem Liss, a new committer, is fairly active, I'd say - Avar's dropped off a bit (work commitments?) plus a few names I've missed. The others with commit access put in a few modifications now and again. ;)
The answer to the unasked question ("is someone who cares reading this?") is yes; although I'd like to humbly point out that feature requests are so much easier to track when filed in the correct location.
Rob Church
On 6/22/06, Rob Church robchur@gmail.com wrote:
The answer to the unasked question ("is someone who cares reading this?") is yes; although I'd like to humbly point out that feature requests are so much easier to track when filed in the correct location.
That's fine, once I figure that a) the feature is desirable and b) there is some chance of it getting implemented, I'm more than happy to lodge a formal feature request for it.
Steve
Heh, I see in this case it's been requested at least 3 times before. Most of the discussion is on http://bugzilla.wikimedia.org/show_bug.cgi?id=1485 and is quite interesting - issues about when to convert to em-dash vs en-dash. The discussion seems to boil down to: everyone agrees that -- to em-dash is intuitive, but how do you then implement a syntax for en-dash for the few people who want it?
Steve
On 6/22/06, Steve Bennett stevage@gmail.com wrote:
On 6/22/06, Rob Church robchur@gmail.com wrote:
The answer to the unasked question ("is someone who cares reading this?") is yes; although I'd like to humbly point out that feature requests are so much easier to track when filed in the correct location.
That's fine, once I figure that a) the feature is desirable and b) there is some chance of it getting implemented, I'm more than happy to lodge a formal feature request for it.
Steve
Steve Bennett wrote:
Heh, I see in this case it's been requested at least 3 times before. Most of the discussion is on http://bugzilla.wikimedia.org/show_bug.cgi?id=1485 and is quite interesting - issues about when to convert to em-dash vs en-dash. The discussion seems to boil down to: everyone agrees that -- to em-dash is intuitive, but how do you then implement a syntax for en-dash for the few people who want it?
Luckily we've already got one: –
Damn typography snobs... ;)
-- brion vibber (brion @ pobox.com)
On Thu, Jun 22, 2006 at 08:17:42AM -0700, Brion Vibber wrote:
Steve Bennett wrote:
Heh, I see in this case it's been requested at least 3 times before. Most of the discussion is on http://bugzilla.wikimedia.org/show_bug.cgi?id=1485 and is quite interesting - issues about when to convert to em-dash vs en-dash. The discussion seems to boil down to: everyone agrees that -- to em-dash is intuitive, but how do you then implement a syntax for en-dash for the few people who want it?
Luckily we've already got one: –
Damn typography snobs... ;)
That may look like a joke, but I agree -- endashes are rare enough in comparison that it seems that if one of the two is going to be somewhat unintuitive, it should be the endash, and thus the HTML entity strikes me as a good answer to the problem.
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any? It should at least solve the image name problem, since spaces in image names should (in my honest opinion) be considered a no-no in any case. Then again, I'm not in charge, of course.
I suppose I should go look at the bugzilla discussion now. I seem to recall, last time I looked at it, that the endash people were advocating for -- being translated to endashes and --- to emdashes, which seems counterintuitive for both, since endashes are supposed to be shorter than - and emdashes have been represented as -- for so long and in so many contexts that using three dashes will just confuse the heck out of many. Maybe I'm wrong, but I don't think so.
I'll shut up now and go read bugzilla discussion. Sorry about the ramble, just thinking "out loud".
Chad Perrin wrote:
On Thu, Jun 22, 2006 at 08:17:42AM -0700, Brion Vibber wrote:
Luckily we've already got one: –
Damn typography snobs... ;)
That may look like a joke, but I agree -- endashes are rare enough in comparison that it seems that if one of the two is going to be somewhat unintuitive, it should be the endash, and thus the HTML entity strikes me as a good answer to the problem.
Sure, I'd consider the endash to be entirely ignorable from a perspective of what common people actually expect to be able to use, type, and see.
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any?
No; the conversion simply needs to be smart enough not to damage markup, but to instead operate only in plain text areas. (Additionally it probably ought to exclude itself from areas like <pre>, <tt>, and <code>.)
It should at least solve the image name problem, since spaces in image names should (in my honest opinion) be considered a no-no in any case. Then again, I'm not in charge, of course.
On the contrary, spaces are highly recommended in image names. Names should be descriptive; "Lake McFoobar viewed from Mount Baz lookout point.jpg" is strongly preferred to, say, "DSC000247.JPG".
Note that images' names are going to be more and more decoupled from their physical filenames in the future; the file extension won't have to be maintained as part of the wiki name for it and it will be possible to change names after uploading.
-- brion vibber (brion @ pobox.com)
On Thu, Jun 22, 2006 at 11:37:16AM -0700, Brion Vibber wrote:
Chad Perrin wrote:
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any?
No; the conversion simply needs to be smart enough not to damage markup, but to instead operate only in plain text areas. (Additionally it probably ought to exclude itself from areas like <pre>, <tt>, and <code>.)
That's a shame. It seemed like such a good idea, too. . . .
It should at least solve the image name problem, since spaces in image names should (in my honest opinion) be considered a no-no in any case. Then again, I'm not in charge, of course.
On the contrary, spaces are highly recommended in image names. Names should be descriptive; "Lake McFoobar viewed from Mount Baz lookout point.jpg" is strongly preferred to, say, "DSC000247.JPG".
Note that images' names are going to be more and more decoupled from their physical filenames in the future; the file extension won't have to be maintained as part of the wiki name for it and it will be possible to change names after uploading.
Ahh, right, I'd forgotten the discussion about decoupling displayed image names from image filenames. Mea culpa.
Chad Perrin wrote:
I suppose I should go look at the bugzilla discussion now. I seem to recall, last time I looked at it, that the endash people were advocating for -- being translated to endashes and --- to emdashes, which seems counterintuitive for both, since endashes are supposed to be shorter than - and emdashes have been represented as -- for so long and in so many contexts that using three dashes will just confuse the heck out of many. Maybe I'm wrong, but I don't think so.
Wot? En dashes are, by definition, one en long. Hyphens are usually shorter.
Compare: - hyphen – en dash — em dash
The difference may be minor if you're reading this in a monospaced font, but both dashes should still be noticeably longer than the hyphen.
Using -- for en and --- for em dashes seems reasonable to me (that's how TeX does it, IIRC), thought I'd be willing to live with – and --. However, I do believe other languages (German?) tend to use the en dash more than English.
How fast this thread growed!
Now, hasn't anybody thought in doing it as javascript at user side? Precisely i did an replace-all javascript feature recently (it's a really simple code anyway).
So i am writing an article. I don't want to use an — codeinsert link, so i type -- where i want an — Before i save, as i know i did this 'trick', i press a link which converts all -- to - and save.
No need to deal with the parse code :)
"Ilmari Karonen" wrote:
Chad Perrin wrote:
I suppose I should go look at the bugzilla discussion now. I seem to recall, last time I looked at it, that the endash people were advocating for -- being translated to endashes and --- to emdashes, which seems counterintuitive for both, since endashes are supposed to be shorter than - and emdashes have been represented as -- for so long and in so many contexts that using three dashes will just confuse the heck out of many. Maybe I'm wrong, but I don't think so.
Wot? En dashes are, by definition, one en long. Hyphens are usually shorter.
Compare:
- hyphen
- en dash
- em dash
The difference may be minor if you're reading this in a monospaced font, but both dashes should still be noticeably longer than the hyphen.
Using -- for en and --- for em dashes seems reasonable to me (that's how TeX does it, IIRC), thought I'd be willing to live with – and --. However, I do believe other languages (German?) tend to use the en dash more than English.
-- Ilmari Karonen
Now, hasn't anybody thought in doing it as javascript at user side?
So it works in Russian Wikipedia
"Wikificator" is a JavaScript currently used in the Russian Wikipedia. It is embedded in the source code of an edit page, so that any visitor can use it just by clicking a link below their editing box. The script applies certain transformations (mainly regarding typographics, such as dashes, quotation marks, etc) to text highlighted in the editbox, if any; otherwise to the whole article. It is very convenient and does a great job improving the look of the Russian Wikipedia's articles. Thw script can be modified only by our sysops, since it sits MediaWiki namespace.
Note that the Russian typographical conventions are quite different to the Anglo-American ones.
The script http://ru.wikipedia.org/wiki/MediaWiki:Wikificator.js Description http://ru.wikipedia.org/wiki/%D0%92%D0%B8%D0%BA%D0%B8%D0%BF%D0%B5%D0%B4%D0%B...
Example
Befor
Text text "text in quotation marks "nested" text" - mdash.
After
Text text «text in quotation marks „nested" text» — mdash.
-- Alexander Sigachov
On Thu, Jun 22, 2006 at 11:16:56PM +0300, Ilmari Karonen wrote:
Chad Perrin wrote:
I suppose I should go look at the bugzilla discussion now. I seem to recall, last time I looked at it, that the endash people were advocating for -- being translated to endashes and --- to emdashes, which seems counterintuitive for both, since endashes are supposed to be shorter than - and emdashes have been represented as -- for so long and in so many contexts that using three dashes will just confuse the heck out of many. Maybe I'm wrong, but I don't think so.
Wot? En dashes are, by definition, one en long. Hyphens are usually shorter.
Compare:
- hyphen
– en dash — em dash
You could be right about that. I haven't dealt with endashes much, so may just be misremembering them. I still like -- for emdashes, though, since (among other reasons) it's pretty nearly the correct size for an emdash anyway (and because it's what pretty much everybody uses when true emdashes aren't available in other contexts). I also like the fact that, in absence of text being parsed and transformed to render true emdashes, -- is recognizable as such to the casual reader (or editor who is viewing the content source, in the case of MediaWiki). Frankly, I tend to expect that if we used -- and --- for endash and emdash, respectively, we'd get thousands of endashes were emdashes were intended, and that it's a problem that would never entirely go away as new users/editors join the Wikimedia community all the time.
So far, though, there seem to be problems with every suggestion put forward -- in some cases, unacceptable problems in an otherwise excellent solution. It will probably be a while before something workable is implemented.
On Thu, Jun 22, 2006 at 05:22:46PM -0600, Chad Perrin wrote:
is viewing the content source, in the case of MediaWiki). Frankly, I tend to expect that if we used -- and --- for endash and emdash, respectively, we'd get thousands of endashes were emdashes were intended, and that it's a problem that would never entirely go away as new users/editors join the Wikimedia community all the time.
I will read everything before writing anything. I will read everything before writing anything. I will read everything before writing anything. I will read everything before writing anything. I will read everything before writing anything. I will read everything before writing anything. I will read everything before writing anything.
:-)
Cheers, -- jr 'Bart' a
On Thu, Jun 22, 2006 at 11:16:56PM +0300, Ilmari Karonen wrote:
Using -- for en and --- for em dashes seems reasonable to me (that's how TeX does it, IIRC), thought I'd be willing to live with – and --.
I'd like to vote strongly against that first mapping, for the (hopefully) perfectly sound reason that "--" for mdash *is what people already do* (see my hopeless case commentary about *bold* and _italics_, weeks ago :-).
Cheers, -- jra
On 6/22/06, Chad Perrin perrin@apotheon.com wrote:
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any? It should at least solve the image name problem, since spaces in image names should (in my honest opinion) be considered a no-no in any case. Then again, I'm not in charge, of course.
I actually don't like this solution, even though it seems neat. The trouble is, it's just not "intuitive" in the sense that no one would expect markup to behave differently whether it has spaces around it or not. Someone is likely to see "foo--boo" get rendered as an en-dash and think "damn, how do I get an em-dash?" Nothing else -- with the exception of space indentation itself -- in mediawiki gets rendered differently depending on spaces surrounding it.
Or maybe I'm confusing Wikipedia and MediaWiki here - your solution is perhaps not bad for MediaWiki, switchable by individual site admins. But for Wikipedia it's a bad idea.
I suppose I should go look at the bugzilla discussion now. I seem to recall, last time I looked at it, that the endash people were advocating for -- being translated to endashes and --- to emdashes, which seems
That was one or two tex-fans...
counterintuitive for both, since endashes are supposed to be shorter than - and emdashes have been represented as -- for so long and in so many contexts that using three dashes will just confuse the heck out of many. Maybe I'm wrong, but I don't think so.
It's "intuitive" *after the fact*. "Oh, I get how it works now - that's cute!" Em-dash from -- is blissfully ignorantly intuitive - people will get it right without even realising that they were doing anything. Many editors probably don't even realise that -- is not a good way of doing an em-dash at the moment.
Steve
On Thu, Jun 22, 2006 at 11:21:29PM +0200, Steve Bennett wrote:
On 6/22/06, Chad Perrin perrin@apotheon.com wrote:
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any? It should at least solve the image name problem, since spaces in image names should (in my honest opinion) be considered a no-no in any case. Then again, I'm not in charge, of course.
I actually don't like this solution, even though it seems neat. The trouble is, it's just not "intuitive" in the sense that no one would expect markup to behave differently whether it has spaces around it or not. Someone is likely to see "foo--boo" get rendered as an en-dash and think "damn, how do I get an em-dash?" Nothing else -- with the exception of space indentation itself -- in mediawiki gets rendered differently depending on spaces surrounding it.
Rendering foo--boo as an endash would be inappropriate, even if the editor intended an endash, anyway. Endashes are not meant to be used as hyphens, and should have a space on either side of them.
Or maybe I'm confusing Wikipedia and MediaWiki here - your solution is perhaps not bad for MediaWiki, switchable by individual site admins. But for Wikipedia it's a bad idea.
I'm not sure I see how something good for MediaWiki is bad for Wikipedia, in this case. Could you elaborate on the difference as it relates to this issue?
counterintuitive for both, since endashes are supposed to be shorter than - and emdashes have been represented as -- for so long and in so many contexts that using three dashes will just confuse the heck out of many. Maybe I'm wrong, but I don't think so.
It's "intuitive" *after the fact*. "Oh, I get how it works now - that's cute!" Em-dash from -- is blissfully ignorantly intuitive - people will get it right without even realising that they were doing anything. Many editors probably don't even realise that -- is not a good way of doing an em-dash at the moment.
That's sort of the problem: people will keep doing emdashes with -- and never realize something's wrong, in many cases, I suspect.
Of course, considering the miniscule differences between hyphens and endashes, I'm not sure endashes really need to be addressed for general visual formatting purposes. I'm sure there are some strict endash fans who will disagree vehemently with me, though.
On 6/23/06, Chad Perrin perrin@apotheon.com wrote:
Rendering foo--boo as an endash would be inappropriate, even if the editor intended an endash, anyway. Endashes are not meant to be used as hyphens, and should have a space on either side of them.
Ok, well, imagine foo=1900 and boo=1910.
Or maybe I'm confusing Wikipedia and MediaWiki here - your solution is perhaps not bad for MediaWiki, switchable by individual site admins. But for Wikipedia it's a bad idea.
I'm not sure I see how something good for MediaWiki is bad for Wikipedia, in this case. Could you elaborate on the difference as it relates to this issue?
I'm thinking flexibility in the software is generally a good thing, as different sites have different needs. Wikipedia, on the other hand, has one fairly well established set of needs.
Of course, considering the miniscule differences between hyphens and endashes, I'm not sure endashes really need to be addressed for general visual formatting purposes. I'm sure there are some strict endash fans who will disagree vehemently with me, though.
Yeah I'd like to hear more about where en-dashes should be used. If it's just for the date range thing, then probably a bot could look for people mistakenly using emdashes or hyphens in those cases. A much less frequent occurrence than people using -- for emdashes currently.
Steve
On Fri, Jun 23, 2006 at 08:10:43AM +0200, Steve Bennett wrote:
On 6/23/06, Chad Perrin perrin@apotheon.com wrote:
Rendering foo--boo as an endash would be inappropriate, even if the editor intended an endash, anyway. Endashes are not meant to be used as hyphens, and should have a space on either side of them.
Ok, well, imagine foo=1900 and boo=1910.
Wow, that was an interesting typo. I meant that to say "bar".
On 6/23/06, Chad Perrin perrin@apotheon.com wrote:
Rendering foo--boo as an endash would be inappropriate, even if the editor intended an endash, anyway. Endashes are not meant to be used as hyphens, and should have a space on either side of them.
Ok, well, imagine foo=1900 and boo=1910.
Wow, that was an interesting typo. I meant that to say "bar".
You copied it from me:
Someone is likely to see "foo--boo" get rendered as an en-dash and think "damn, how do I get an em-dash?"
(I'm a bit of a trendsetter, I use foo/boo/moo rather than foo/bar/baz...)
Steve
On Fri, Jun 23, 2006 at 09:10:20AM +0200, Steve Bennett wrote:
On 6/23/06, Chad Perrin perrin@apotheon.com wrote:
Rendering foo--boo as an endash would be inappropriate, even if the editor intended an endash, anyway. Endashes are not meant to be used as hyphens, and should have a space on either side of them.
Ok, well, imagine foo=1900 and boo=1910.
Wow, that was an interesting typo. I meant that to say "bar".
You copied it from me:
Someone is likely to see "foo--boo" get rendered as an en-dash and think "damn, how do I get an em-dash?"
(I'm a bit of a trendsetter, I use foo/boo/moo rather than foo/bar/baz...)
Okay. I'll just agree, and let my confusion over the metasyntactic variable progression in this thread wash over me.
I will not be confused Confusion is the mind-killer Confusion is the little death Where the confusion has gone there will be nothing Only I will remain
(apologies to Frank Herbert)
On Fri, Jun 23, 2006 at 08:10:43AM +0200, Steve Bennett wrote:
Yeah I'd like to hear more about where en-dashes should be used. If it's just for the date range thing, then probably a bot could look for people mistakenly using emdashes or hyphens in those cases. A much less frequent occurrence than people using -- for emdashes currently.
Captain Obvious strikes:
http://www.google.com/search?q=en+dash+usage
Cheers, -- jra
On Thu, Jun 22, 2006 at 12:21:28PM -0600, Chad Perrin wrote:
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any?
Please don't.
<snob type=typography> Em dashes are properly set in English text without spacing on either side, though the ASCIIography of this usage is much less picky.
If someone decides that it needs to be " -- " that's mapped, at least take the spaces out when setting the glyph? </snob>
Cheers, -- jra
On Thu, Jun 22, 2006 at 10:33:27PM -0400, Jay R. Ashworth wrote:
On Thu, Jun 22, 2006 at 12:21:28PM -0600, Chad Perrin wrote:
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any?
Please don't.
<snob type=typography> Em dashes are properly set in English text without spacing on either side, though the ASCIIography of this usage is much less picky.
If someone decides that it needs to be " -- " that's mapped, at least take the spaces out when setting the glyph?
</snob>
. . . except that the notion that it's strictly "foo--bar" instead of "foo -- bar" is a relatively recent phenomenon, and in times past whether or not there was a space was largely a matter of taste, regional custom, and (even farther back) what leads the typesetter had on-hand when he was laying out the page for pressing.
On Thu, Jun 22, 2006 at 09:24:51PM -0600, Chad Perrin wrote:
On Thu, Jun 22, 2006 at 10:33:27PM -0400, Jay R. Ashworth wrote:
On Thu, Jun 22, 2006 at 12:21:28PM -0600, Chad Perrin wrote:
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any?
Please don't.
<snob type=typography> Em dashes are properly set in English text without spacing on either side, though the ASCIIography of this usage is much less picky.
If someone decides that it needs to be " -- " that's mapped, at least take the spaces out when setting the glyph?
</snob>
. . . except that the notion that it's strictly "foo--bar" instead of "foo -- bar" is a relatively recent phenomenon, and in times past whether or not there was a space was largely a matter of taste, regional custom, and (even farther back) what leads the typesetter had on-hand when he was laying out the page for pressing.
Ok, then you're more of a typography snob than me. :-)
I will admit, having seen it both ways, that in typsetting, it tends to look better to me with little spacing (a shim worth), but in ASCII text, I tend to put in the spaces, as well.
Clearly, if this is done, it needs to be considered whether surrounding spaces should be eaten by the parser.
Cheers, -- jra
On Thu, Jun 22, 2006 at 11:44:22PM -0400, Jay R. Ashworth wrote:
On Thu, Jun 22, 2006 at 09:24:51PM -0600, Chad Perrin wrote:
On Thu, Jun 22, 2006 at 10:33:27PM -0400, Jay R. Ashworth wrote:
On Thu, Jun 22, 2006 at 12:21:28PM -0600, Chad Perrin wrote:
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any?
Please don't.
<snob type=typography> Em dashes are properly set in English text without spacing on either side, though the ASCIIography of this usage is much less picky.
If someone decides that it needs to be " -- " that's mapped, at least take the spaces out when setting the glyph?
</snob>
. . . except that the notion that it's strictly "foo--bar" instead of "foo -- bar" is a relatively recent phenomenon, and in times past whether or not there was a space was largely a matter of taste, regional custom, and (even farther back) what leads the typesetter had on-hand when he was laying out the page for pressing.
Ok, then you're more of a typography snob than me. :-)
Err, thanks, I guess. I tend to be willing to overlook either, though no spacing bothers my eyes -- kinda like looking at Python source code.
I will admit, having seen it both ways, that in typsetting, it tends to look better to me with little spacing (a shim worth), but in ASCII text, I tend to put in the spaces, as well.
I don't recall for sure how it all started out. It may be that before leading availability was an issue the emdash was one of those things that was supposed to have a half-space rather than a full space or no space, but leading availability screwed with the "standard" until everything kinda went to hell, then a new "standard" arose -- though in this case it was two standards. Similarly, in handwriting, there was supposed to be a space after a period, but no specific two-space standard was in place. Then, with the advent of print text, they went to two spaces because it was easier to read. Eventually, because of leading availability, typesetters went to 1.5 spaces (didn't want to use up all their full-size space leads on spacing between sentences), which made folks like the Microsoft Word developers decide that one space or two was six of one and half a dozen of the other, and they made one space the standard operating format for their software. Despite that, it's still more natural looking and easier to read with two spaces after a period than with one.
Meh. Yeah, I'm kind of particular about all this stuff, I guess. Frankly, I'd like 1.5 spaces after periods, but in absence of that capability on my Thinkpad, I'll just use two spaces and be happy it doesn't have to be one. Same with two hyphens, with a space on either side, instead of an emdash with a half-space on either side (which is what I think was the more "proper" use originally in print).
Clearly, if this is done, it needs to be considered whether surrounding spaces should be eaten by the parser.
Agreed. I vote "no".
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on Usenet and in e-mail?
At least we agree on posting order.
On Fri, Jun 23, 2006 at 12:44:09AM -0600, Chad Perrin wrote:
Clearly, if this is done, it needs to be considered whether surrounding spaces should be eaten by the parser.
Agreed. I vote "no".
Oh; you're voting "no, the parser should *not* automagically replace '--' and/or ' -- ' with —", then?
Might this, perhaps, be just a subst type thing that happens once when a page is saved?
Cheers, -- jra
On Fri, Jun 23, 2006 at 11:02:06AM -0400, Jay R. Ashworth wrote:
On Fri, Jun 23, 2006 at 12:44:09AM -0600, Chad Perrin wrote:
Clearly, if this is done, it needs to be considered whether surrounding spaces should be eaten by the parser.
Agreed. I vote "no".
Oh; you're voting "no, the parser should *not* automagically replace '--' and/or ' -- ' with —", then?
Err, no, I'm voting "No, the parser should not eat surrounding spaces." I just don't like the look of emdashes without any spacing at all.
. . . though it looks like the parser automagically replacing it does present some technical issues all its own, and perhaps I should also say that I vote "no" for surrounding spaces being eaten by other text-replacement techniques that might be used to turn -- into an emdash (such as the suggested Javascript method, which wouldn't work for everyone, but would at least circumvent the technical issues with having the parser do it).
On 6/23/06, Chad Perrin perrin@apotheon.com wrote:
(such as the suggested Javascript method, which wouldn't work for everyone, but would at least circumvent the technical issues with having the parser do it).
A dev can correct me if I'm wrong, but I don't think there would be any substantial difference between coding it to work on the Javascript side vs. the PHP side. Either implementation would face the same problems: you have to make sure it doesn't substitute anything in wikilinks, URLs, SGML attributes/values, or preferably in any of the default monospaced things (<tt>, <code>, <pre>, space-initial line), plus perhaps some other stuff. If you can write that in Javascript, you can write it in PHP instead and stick it in somewhere for Yet Another Parser Pass™ to get much more consistency as well as eliminating a large part of the objections to current solutions, namely that they look ugly or confusing in wikitext.
On 6/23/06, Timwi timwi@gmx.net wrote:
Steve Bennett wrote:
Let's be honest here: space indented text is *weird*. What's the historical context behind it? Why is it monospaced, coloured, surrounded with a border, and why doesn't it wrap? It's such an odd layout choice for something so easy to do by accident.
This "weirdness" comes from the fact that this particular wiki markup is not really defined in terms of what visible output it produces, but only in terms of what HTML it produces -- namely <pre>.
I think his actual question was: "Why was an initial space, which a newbie would commonly add unwittingly, chosen to have such drastic and confusing effects?" The answer to that is slightly complicated.
The first point to realize is that HTML specifies that all leading spaces are stripped by default, and other whitespace is compressed into a single space (this is so HTML files aren't a nightmare to read, with tons of tags squashed together lest sensible spacing format the resulting document weirdly). So the most obvious choice would be to just translate the spaces literally and have them vanish when the browser renders them; most blogging tools, Web forums, etc. do this, and MediaWiki is not an exception―try it yourself and see. (SMF forums are the one exception I know of offhand; they translate spaces that would otherwise be compressed into "non-breaking spaces", possibly familiar to you as .)
So what to do with leading spaces? Well, one common and important use for leading spaces is to make some kind of chart or diagram, to indent elements so they fit together properly. HTML provides the <pre> tag for that: it overrides the default space-compressing behavior and PREserves all spaces, line breaks, etc. exactly as entered. So, if you knew HTML, you could make such a chart with <pre>, but what if you didn't? Well, with this markup, just try entering it and presto, it will work, because your leading spaces clue the parser into the fact that you want it to display with spacing preserved.
So that explains the monospacing and space-preservation. What about the linebreaking? Well, same principle. If you're making a chart, you definitely don't want a line to be chopped up into pieces just because someone's using a small screen; it would totally ruin the relational intent. Finally, the gray thing and the border are just a minor stylistic decision, which isn't even decided by the parser but rather by our stylesheet. You can change that for yourself by adding something like "pre { background: transparent; border: none; }" to your stylesheet.
Of course, it's true that some people get confused when they try to manually indent their paragraphs, which doesn't work. But that's easily fixable by a more experienced editor, so it's not a big deal.
Anyway, I hope that short essay explained things. ;)
On 6/23/06, Simetrical Simetrical+wikitech@gmail.com wrote:
So what to do with leading spaces? Well, one common and important use for leading spaces is to make some kind of chart or diagram, to indent elements so they fit together properly. HTML provides the <pre> tag for that: it overrides the default space-compressing behavior and PREserves all spaces, line breaks, etc. exactly as entered. So, if you knew HTML, you could make such a chart with <pre>, but what if you didn't? Well, with this markup, just try entering it and presto, it will work, because your leading spaces clue the parser into the fact that you want it to display with spacing preserved.
Thanks for this explanation. I don't quite follow the logical step that was made between "one common and important use" and "the way this should work", but, hey. In practice, in Wikipedia, this "common and important" use is basically never used - we certainly don't want ASCII art in our articles. A much more "common" use would actually be to allow quoting of other material. Grey box, border and indent: yes. Monospace and no word-wrap: no. As I say, on the rare instances that someone actually wants ascii art, they could always use <PRE> manually.
But yeah, that's just the way things work out. I seem to recall we had a discussion about a syntax element to make quoting easier recently, but nothing much came out of it.
Steve
On 6/23/06, Steve Bennett stevage@gmail.com wrote:
Thanks for this explanation. I don't quite follow the logical step that was made between "one common and important use" and "the way this should work", but, hey. In practice, in Wikipedia, this "common and important" use is basically never used - we certainly don't want ASCII art in our articles.
True, in general. The *idea* was to make things a bit less confusing, which I think probably failed. YMMV.
Steve Bennett wrote:
Thanks for this explanation. I don't quite follow the logical step that was made between "one common and important use" and "the way this should work", but, hey. In practice, in Wikipedia, this "common and important" use is basically never used - we certainly don't want ASCII art in our articles.
We do, however, quite often want source code in our articles.
On 6/25/06, Ilmari Karonen nospam@vyznev.net wrote:
Steve Bennett wrote:
Thanks for this explanation. I don't quite follow the logical step that was made between "one common and important use" and "the way this should work", but, hey. In practice, in Wikipedia, this "common and important" use is basically never used - we certainly don't want ASCII art in our articles.
We do, however, quite often want source code in our articles.
And it would probably be appropriate to have a good way of doing that. Source code doesn't necessarily demand lack of line wrapping. Perhaps a proper <source> tag would be the way to go.
Steve
On Sun, Jun 25, 2006 at 07:22:53PM +0200, Steve Bennett wrote:
We do, however, quite often want source code in our articles.
And it would probably be appropriate to have a good way of doing that. Source code doesn't necessarily demand lack of line wrapping.
It doesn't?
Certainly there are languages in which wrapping a line breaks the syntax; the Bourne Shell is only the first one that comes to mind...
Cheers, -- jra
On Sun, Jun 25, 2006 at 01:32:53PM -0400, Jay R. Ashworth wrote:
On Sun, Jun 25, 2006 at 07:22:53PM +0200, Steve Bennett wrote:
We do, however, quite often want source code in our articles.
And it would probably be appropriate to have a good way of doing that. Source code doesn't necessarily demand lack of line wrapping.
It doesn't?
Certainly there are languages in which wrapping a line breaks the syntax; the Bourne Shell is only the first one that comes to mind...
If by "wrapping a line" you mean inserting a linebreak, yes, you're absolutely correct. Others include Logo, BASIC, and Python (though Logo at least provides for the ability to break a line on the page without breaking a semantic line of code via a work-around syntax).
On the other hand, the words "doesn't necessarily" seem technically accurate, since there are many languages that don't care what linebreaks you have in a file, with the possible exception of linebreaks in the midst of comments and partway through keywords, function names, et cetera. I suppose the question that needs to be asked at this point is whether the necessities of some languages suggest that we should impose the same limits on presentation of other languages that do not share those necessities.
While we're at it, we might consider whether to allow line wrapping, but provide a line-wrap indicator syntax to show that what appears on a separate line of the page is in fact meant to be part of the same semantic line of code. I'm thinking of examples such as the plus-syntax used on websites such as perlmonks.org. An example of this usage might look something like this:
|while (<>) { | if ( $scalar_foo <= $scalar_bar ) && ( $scalar_bar <= $array_qux[2] +|) { | print; | } |}
. . . where the plus character is typically some differentiating color, such as red. That may or may not be appropriate for something like MediaWiki -- I only bring it up as something to consider.
On Sun, Jun 25, 2006 at 12:26:36PM -0600, Chad Perrin wrote:
And it would probably be appropriate to have a good way of doing that. Source code doesn't necessarily demand lack of line wrapping.
It doesn't?
Certainly there are languages in which wrapping a line breaks the syntax; the Bourne Shell is only the first one that comes to mind...
If by "wrapping a line" you mean inserting a linebreak, yes, you're absolutely correct. Others include Logo, BASIC, and Python (though Logo at least provides for the ability to break a line on the page without breaking a semantic line of code via a work-around syntax).
On the other hand, the words "doesn't necessarily" seem technically accurate, since there are many languages that don't care what linebreaks you have in a file, with the possible exception of linebreaks in the midst of comments and partway through keywords, function names, et cetera. I suppose the question that needs to be asked at this point is whether the necessities of some languages suggest that we should impose the same limits on presentation of other languages that do not share those necessities.
Certainly, you might add an attribute like "linebreak=ok", though I'm not sure how the HTML markup would have to go there...
While we're at it, we might consider whether to allow line wrapping, but provide a line-wrap indicator syntax to show that what appears on a separate line of the page is in fact meant to be part of the same semantic line of code. I'm thinking of examples such as the plus-syntax used on websites such as perlmonks.org. An example of this usage might look something like this:
|while (<>) { | if ( $scalar_foo <= $scalar_bar ) && ( $scalar_bar <= $array_qux[2] +|) { | print; | } |}
. . . where the plus character is typically some differentiating color, such as red. That may or may not be appropriate for something like MediaWiki -- I only bring it up as something to consider.
Mutt does that with wrapped URL's in my mail. It's a pain because you can't then paste the copied text in without having to fix it.
On one-line URL's, this is practical.
If it's a couple hundred lines of code, maybe not so much.
Cheers, -- jra
On Sun, Jun 25, 2006 at 02:44:10PM -0400, Jay R. Ashworth wrote:
On Sun, Jun 25, 2006 at 12:26:36PM -0600, Chad Perrin wrote:
While we're at it, we might consider whether to allow line wrapping, but provide a line-wrap indicator syntax to show that what appears on a separate line of the page is in fact meant to be part of the same semantic line of code. I'm thinking of examples such as the plus-syntax used on websites such as perlmonks.org. An example of this usage might look something like this:
|while (<>) { | if ( $scalar_foo <= $scalar_bar ) && ( $scalar_bar <= $array_qux[2] +|) { | print; | } |}
. . . where the plus character is typically some differentiating color, such as red. That may or may not be appropriate for something like MediaWiki -- I only bring it up as something to consider.
Mutt does that with wrapped URL's in my mail. It's a pain because you can't then paste the copied text in without having to fix it.
On one-line URL's, this is practical.
If it's a couple hundred lines of code, maybe not so much.
I, too, use mutt, and have the same issue from time to time. It's easily solved with XHTML and CSS, though: use block elements and styles to position the plus signs (or whatever) in a separate block from the stuff you want to wrap, or use inline element styles to attach some visual cue like a left-side or right-side thin and understated border, or use a left-aligned background image on wrapped lines, or turn the block of code with line-wraps into a link to a plaintext file without linewraps, or . . .
One of the benefits of rendered markup with styles is that it allows you to use something other than content characters within the same block of content as visual presentation, thus allowing for visually associated characteristics that are not associated as closely according to what your mouse will highlight for copying and pasting.
On Sun, Jun 25, 2006 at 12:51:33PM -0600, Chad Perrin wrote:
Mutt does that with wrapped URL's in my mail. It's a pain because you can't then paste the copied text in without having to fix it.
On one-line URL's, this is practical.
If it's a couple hundred lines of code, maybe not so much.
I, too, use mutt, and have the same issue from time to time. It's easily solved with XHTML and CSS, though: use block elements and styles to position the plus signs (or whatever) in a separate block from the stuff you want to wrap, or use inline element styles to attach some visual cue like a left-side or right-side thin and understated border, or use a left-aligned background image on wrapped lines, or turn the block of code with line-wraps into a link to a plaintext file without linewraps, or . . .
One of the benefits of rendered markup with styles is that it allows you to use something other than content characters within the same block of content as visual presentation, thus allowing for visually associated characteristics that are not associated as closely according to what your mouse will highlight for copying and pasting.
Excellent point, though I was thrown a bit there at first, wondering how exactly you were going to get Mutt to issue XHTML. :-)
Cheers, -- jra
On Sun, Jun 25, 2006 at 03:32:04PM -0400, Jay R. Ashworth wrote:
On Sun, Jun 25, 2006 at 12:51:33PM -0600, Chad Perrin wrote:
One of the benefits of rendered markup with styles is that it allows you to use something other than content characters within the same block of content as visual presentation, thus allowing for visually associated characteristics that are not associated as closely according to what your mouse will highlight for copying and pasting.
Excellent point, though I was thrown a bit there at first, wondering how exactly you were going to get Mutt to issue XHTML. :-)
Er, no. Heh. It should be obvious on a second look that I wasn't talking about mutt when referring to using XHTML for styling.
On 6/25/06, Jay R. Ashworth jra@baylink.com wrote:
On Sun, Jun 25, 2006 at 07:22:53PM +0200, Steve Bennett wrote:
We do, however, quite often want source code in our articles.
And it would probably be appropriate to have a good way of doing that. Source code doesn't necessarily demand lack of line wrapping.
It doesn't?
Certainly there are languages in which wrapping a line breaks the syntax; the Bourne Shell is only the first one that comes to mind...
I vaguelly recall that this conversation started with the comment that -- should not be converted to em-dash for C source code.
Anyway, any time you don't want wrapping, there's <nobr> for that. But in any case, I don't think the syntax/meaning of space indentation is likely to change, so this is a bit moot.
Steve
Steve Bennett wrote:
But in any case, I don't think the syntax/meaning of space indentation is likely to change, so this is a bit moot.
Indeed. To put this in historical perspective, I believe the use of indentation for source code (and preformatted text in general) can be traced back to Cunningham's original WikiWikiWeb. He may have got the idea from POD, which was introduced with about six months earlier with Perl 5. I'm not sure where the idea came from before that -- I suppose it's possible it was invented by Larry Wall, but it's also possible that he got the idea from some other markup language.
On 6/23/06, Jay R. Ashworth jra@baylink.com wrote:
On Fri, Jun 23, 2006 at 12:44:09AM -0600, Chad Perrin wrote:
Clearly, if this is done, it needs to be considered whether surrounding spaces should be eaten by the parser.
Agreed. I vote "no".
Oh; you're voting "no, the parser should *not* automagically replace '--' and/or ' -- ' with —", then?
Might this, perhaps, be just a subst type thing that happens once when a page is saved?
No other wiki markup used in article space works that way.[1]
Steve [1] except subst:
On Sat, Jun 24, 2006 at 01:46:41AM +1000, Brianna Laugher wrote:
No other wiki markup used in article space works that way.[1]
Steve [1] except subst:
and ~~~ / ~~~~ / ~~~~ . oh wait, article space. *mutters* technicality!
I've never tried it: *do* the tilde subs work in article namespace?
Cheers, -- jra
On 23/06/06, Jay R. Ashworth jra@baylink.com wrote:
I've never tried it: *do* the tilde subs work in article namespace?
Yes, the pre-save transform is indiscriminate. Wikipedia is communism, and equal rights and all that, after all.
Rob Church
On 6/23/06, Jay R. Ashworth jra@baylink.com wrote:
On Sat, Jun 24, 2006 at 01:46:41AM +1000, Brianna Laugher wrote:
No other wiki markup used in article space works that way.[1]
Steve [1] except subst:
and ~~~ / ~~~~ / ~~~~ . oh wait, article space. *mutters* technicality!
I've never tried it: *do* the tilde subs work in article namespace?
Yeah, I've occasionally cleaned up other people's signatures. I've almost done it myself, but not yet.
~~~~
Steve Bennett-4 wrote:
On 6/23/06, Jay R. Ashworth jra@baylink.com wrote:
On Fri, Jun 23, 2006 at 12:44:09AM -0600, Chad Perrin wrote: Might this, perhaps, be just a subst type thing that happens once when a page is saved?
No other wiki markup used in article space works that way.[1] [1] except subst:
Don't forget the good old pipe-trick...
HTH HAND
On 6/28/06, Phil Boswell phil.boswell@gmail.com wrote:
No other wiki markup used in article space works that way.[1] [1] except subst:
Don't forget the good old pipe-trick...
The pipe-trick doesn't cause any magic substition to happen at page-save time?
Steve
Steve Bennett-4 wrote:
On 6/28/06, Phil Boswell phil.boswell@gmail.com wrote:
No other wiki markup used in article space works that way.[1] [1] except subst:
Don't forget the good old pipe-trick...
The pipe-trick doesn't cause any magic substition to happen at page-save time?
Yes, it does...sorry if my phrasing might have misled.
[[namespace:title (disambig)|]] will be transformed to [[namespace:title (disambig)|title]] at page-save time.
There are some strange edge-cases however: http://en.wikipedia.org/wiki/User:Phil_Boswell/Sandbox/Sandbox_%28with_brack...
I don't know when it happened, but the "Show Changes" button will now show this happening...there was a time early on when it didn't (it also now shows {{subst:...}} working also).
HTH HAND
On 6/28/06, Phil Boswell phil.boswell@gmail.com wrote:
Yes, it does...sorry if my phrasing might have misled.
[[namespace:title (disambig)|]] will be transformed to [[namespace:title (disambig)|title]] at page-save time.
There are some strange edge-cases however: http://en.wikipedia.org/wiki/User:Phil_Boswell/Sandbox/Sandbox_%28with_brack...
I don't know when it happened, but the "Show Changes" button will now show this happening...there was a time early on when it didn't (it also now shows {{subst:...}} working also).
Fascinating, I had no idea it worked like that. That "with brackets" case is really freaky.
For spectators: if you type [[|foo]], for some reason the software replaces it with [[foo (with brackets)|foo]]. I would love to hear the reason for that!
Steve
Steve Bennett-4 wrote:
Fascinating, I had no idea it worked like that. That "with brackets" case is really freaky.
For spectators: if you type [[|foo]], for some reason the software replaces it with [[foo (with brackets)|foo]]. I would love to hear the reason for that!
For the avoidance of confusion, this only happens if the link is in an article whose title ends with "...(in brackets)".
In other words, if you are editing an article whose title ends with a parenthesised word or phrase, any links which begin with a pipe character will be expanded to include that word or phrase, together with parentheses, in the **title** part of the link.
Freaky indeed, and it took a couple of tries before I believed what I was seeing.
HTH HAND
On 6/28/06, Phil Boswell phil.boswell@gmail.com wrote:
For the avoidance of confusion, this only happens if the link is in an article whose title ends with "...(in brackets)".
In other words, if you are editing an article whose title ends with a parenthesised word or phrase, any links which begin with a pipe character will be expanded to include that word or phrase, together with parentheses, in the **title** part of the link.
Freaky indeed, and it took a couple of tries before I believed what I was seeing.
Ah, I see.
So at page "foo (boo)", if you create link [[|moo]] it replaces it with [[moo (boo)|moo]]. Useful indeed!
Steve
On 6/23/06, Jay R. Ashworth jra@baylink.com wrote:
On Thu, Jun 22, 2006 at 12:21:28PM -0600, Chad Perrin wrote:
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any?
Please don't.
<snob type=typography> Em dashes are properly set in English text without spacing on either side, though the ASCIIography of this usage is much less picky.
Can you provide a reference for this? Also can you be sure this is the only style and that it doesn't vary by style guide, by publisher, by country, by newspaper vs novels, etc. The reason I ask is that I've been studying casually how they are typeset in books as part of my thinking about an XML format for e-texts. I have seen so many cases both with and without spaces that I've been pondering whether it would best be handled as a style issue.
Andrew Dunbar (hippietrail)
If someone decides that it needs to be " -- " that's mapped, at least take the spaces out when setting the glyph?
</snob>
Cheers,
-- jra
Jay R. Ashworth jra@baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on Usenet and in e-mail?
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Sun, Jun 25, 2006 at 08:47:37PM +0200, Andrew Dunbar wrote:
On 6/23/06, Jay R. Ashworth jra@baylink.com wrote:
On Thu, Jun 22, 2006 at 12:21:28PM -0600, Chad Perrin wrote:
Would requiring spaces on either side of the double dash before converting it into an emdash improve the parsing behavior any?
Please don't.
<snob type=typography> Em dashes are properly set in English text without spacing on either side, though the ASCIIography of this usage is much less picky.
Can you provide a reference for this? Also can you be sure this is the only style and that it doesn't vary by style guide, by publisher, by country, by newspaper vs novels, etc. The reason I ask is that I've been studying casually how they are typeset in books as part of my thinking about an XML format for e-texts. I have seen so many cases both with and without spaces that I've been pondering whether it would best be handled as a style issue.
I think it most definitely is best handled as a style issue, or perhaps as a per-document or per-content-block formatting issue. I still have my preference for spaces, though. They make it look less like an oversized hyphen.
On Sun, Jun 25, 2006 at 08:47:37PM +0200, Andrew Dunbar wrote:
<snob type=typography> Em dashes are properly set in English text without spacing on either side, though the ASCIIography of this usage is much less picky.
Can you provide a reference for this? Also can you be sure this is the only style and that it doesn't vary by style guide, by publisher, by country, by newspaper vs novels, etc.
A formal reference?
No; not off hand.
Just personal experience from 35 years or so of reading American typesetting, and at least 15 years of paying professional attention thereto.
The reason I ask is that I've been studying casually how they are typeset in books as part of my thinking about an XML format for e-texts. I have seen so many cases both with and without spaces that I've been pondering whether it would best be handled as a style issue.
You've seen commercial typesetting (that is, typesetting done by typographers, not computer scientists :-) that put full spaces around em-dashes?
I'd be interested in references myself, if you could lay hands on any.
It's a topic I'm close to, because my instinct is to want to render standard typesetting as closely as possible in ASCII, but I just can't bring myself to set ASCII em dashes--you know, these things--without the surrounding spaces -- even though that's how I see them set in type.
Cheers, -- jra
On 6/25/06, Jay R. Ashworth jra@baylink.com wrote:
It's a topic I'm close to, because my instinct is to want to render standard typesetting as closely as possible in ASCII, but I just can't bring myself to set ASCII em dashes--you know, these things--without the surrounding spaces -- even though that's how I see them set in type.
Weird, I tend to use hyphens with spaces - like this - to achieve that effect.
Steve
On Sun, Jun 25, 2006 at 09:40:26PM +0200, Steve Bennett wrote:
On 6/25/06, Jay R. Ashworth jra@baylink.com wrote:
It's a topic I'm close to, because my instinct is to want to render standard typesetting as closely as possible in ASCII, but I just can't bring myself to set ASCII em dashes--you know, these things--without the surrounding spaces -- even though that's how I see them set in type.
Weird, I tend to use hyphens with spaces - like this - to achieve that effect.
I see that, too. A good corpus to check would probably be a couple of Usenet groups, selected for their length of posting.
Cheers, -- jra
On Sun, Jun 25, 2006 at 03:49:51PM -0400, Jay R. Ashworth wrote:
On Sun, Jun 25, 2006 at 09:40:26PM +0200, Steve Bennett wrote:
On 6/25/06, Jay R. Ashworth jra@baylink.com wrote:
It's a topic I'm close to, because my instinct is to want to render standard typesetting as closely as possible in ASCII, but I just can't bring myself to set ASCII em dashes--you know, these things--without the surrounding spaces -- even though that's how I see them set in type.
Weird, I tend to use hyphens with spaces - like this - to achieve that effect.
I see that, too. A good corpus to check would probably be a couple of Usenet groups, selected for their length of posting.
That's not a bad approximation of using endashes, really.
On 6/25/06, Jay R. Ashworth jra@baylink.com wrote:
On Sun, Jun 25, 2006 at 08:47:37PM +0200, Andrew Dunbar wrote:
<snob type=typography> Em dashes are properly set in English text without spacing on either side, though the ASCIIography of this usage is much less picky.
Can you provide a reference for this? Also can you be sure this is the only style and that it doesn't vary by style guide, by publisher, by country, by newspaper vs novels, etc.
A formal reference?
No; not off hand.
Just personal experience from 35 years or so of reading American typesetting, and at least 15 years of paying professional attention thereto.
I ask for a reference because I've seen people say that serial commas (also known as Harvard commas or Oxford commas) are correct or incorrect whereas in that case it does turn out to depend on region, publisher, etc. but people wrongly assume there is a global rule.
The reason I ask is that I've been studying casually how they are typeset in books as part of my thinking about an XML format for e-texts. I have seen so many cases both with and without spaces that I've been pondering whether it would best be handled as a style issue.
You've seen commercial typesetting (that is, typesetting done by typographers, not computer scientists :-) that put full spaces around em-dashes?
I can't say for sure whether they were full spaces or half spaces. I do have books from several countries, several eras, and several languages nearby though so I'll try to look through some in the coming days for you.
I'd be interested in references myself, if you could lay hands on any.
It's a topic I'm close to, because my instinct is to want to render standard typesetting as closely as possible in ASCII, but I just can't bring myself to set ASCII em dashes--you know, these things--without the surrounding spaces -- even though that's how I see them set in type.
I'm interesting in typesetting variations like these so that I can normalize e-books to be parsable by machine etc and then perhaps come up with a simple xml dtd that takes as many typing variations existing in plaintext files and turns them into a single standard markup or declaration leaving presentation to styles where it belongs.
Andrew Dunbar (hippietrail)
Cheers,
-- jra
Jay R. Ashworth jra@baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on Usenet and in e-mail?
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Sun, Jun 25, 2006 at 10:02:40PM +0200, Andrew Dunbar wrote:
On 6/25/06, Jay R. Ashworth jra@baylink.com wrote:
On Sun, Jun 25, 2006 at 08:47:37PM +0200, Andrew Dunbar wrote:
<snob type=typography> Em dashes are properly set in English text without spacing on either side, though the ASCIIography of this usage is much less picky.
Can you provide a reference for this? Also can you be sure this is the only style and that it doesn't vary by style guide, by publisher, by country, by newspaper vs novels, etc.
A formal reference?
No; not off hand.
Just personal experience from 35 years or so of reading American typesetting, and at least 15 years of paying professional attention thereto.
I ask for a reference because I've seen people say that serial commas (also known as Harvard commas or Oxford commas) are correct or incorrect whereas in that case it does turn out to depend on region, publisher, etc. but people wrongly assume there is a global rule.
Serial commas are historically more "proper", but that has been lost with time. I prefer them due to greater precision and communication of intended meaning (particularly in cases where a series may contain sub-series), and because they tend to better represent the conceptual flow of the sentence as well as the verbal flow when it is pronounced aloud. In other words, use of serial commans enhances clarity.
On Sun, Jun 25, 2006 at 03:36:01PM -0400, Jay R. Ashworth wrote:
On Sun, Jun 25, 2006 at 08:47:37PM +0200, Andrew Dunbar wrote:
The reason I ask is that I've been studying casually how they are typeset in books as part of my thinking about an XML format for e-texts. I have seen so many cases both with and without spaces that I've been pondering whether it would best be handled as a style issue.
You've seen commercial typesetting (that is, typesetting done by typographers, not computer scientists :-) that put full spaces around em-dashes?
I'd be interested in references myself, if you could lay hands on any.
For my own part, the only references I have on-hand at the moment are:
* one professional writer's reference (AP style and media law guide)
* a bunch of books that are professionally printed
* periodicals and news sources that are professionally printed
In my experience, spaces tend to be more common in literary works, and lack of spaces tends to be the norm in news media and the like -- though news media, et cetera, tends to frown on emdashes (or endashes for that matter) at least as much as it frowns on parentheses. This is borne out by the instruction set forth in the AP book.
Chad Perrin wrote:
Damn typography snobs... ;)
That may look like a joke, but I agree -- endashes are rare enough in comparison that it seems that if one of the two is going to be somewhat unintuitive, it should be the endash, and thus the HTML entity strikes me as a good answer to the problem.
Huh? I'd suggest reading [[Dash (punctuation)]] to see that en dashes aren't that rare. But in any case this is a quite english centric point of view. In German for example (and I guess that's not the only language where this is valid, but that's the language I know well) em dashes are even more rare than en dashes in English. And it's not astonishing that users here are used for years to write -- to instruct word processors to use an en dash.
So please be careful when using "rare" and "unintuitive". That's beside I prefer using JS for inserting such special chars and that I think of interpreting -- or --- to be quite insecure if done by a machine.
Jürgen
On 6/23/06, Jürgen Herz jherz@myrealbox.com wrote:
Huh? I'd suggest reading [[Dash (punctuation)]] to see that en dashes aren't that rare. But in any case this is a quite english centric point of view. In German for example (and I guess that's not the only language where this is valid, but that's the language I know well) em dashes are even more rare than en dashes in English.
Interesting point. It reinforces my belief that this ought to be an optional feature. Particularly for wikis that contain a lot of unescaped C code...:)
So please be careful when using "rare" and "unintuitive". That's beside I prefer using JS for inserting such special chars and that I think of interpreting -- or --- to be quite insecure if done by a machine.
I'm definitely guilty of only thinking from an English mindset. Fwiw, in French, I believe emdashes are even more common, used where we might use parentheses or commas. Dunno about endashes.
Steve
wikitech-l@lists.wikimedia.org