Hello,
David Gerard a écrit :
http://lists.wikimedia.org/mailman/listinfo/wikitext-l
Wikitext-l was formed from a recent discussion on wikitech-l about the need to sanely reimplement the current parser, which is a Horrible Mess and pretty much impossible to reimplement in another language.
The MediaWiki parser definition is literally "whatever the PHP parser does." Some of what it does is arguably very wrong, pathological, magical or just a Stupid Parser Trick. So the list has been formed to come up with a grammar that defines all the useful parts of the present parser, and so can be used by anyone to implement a MediaWiki wikitext parser. This will be useful for other software, for WYSIWYG editing extensions ... all manner of things.
Some of what some people would think of as a "stupid parser trick" is in fact important - e.g. L'''uomo'' which renders as L<i>uomo</i> (necessary for French and Italian).
Actually, the proper French apostrophe should be ’ (Unicode : U2019, Code HTML : ’) not ' On the French Wikisource, we systematically replace ' with ’ in all articles and titles with bots (keeping redirects). So actually, ''' should be ’'' in proper French typography.
The issue is that ’ is not in the standard French keyboard, and it does not exist in Latin1 (like œ for oe). There is also the problem with broken softwares, like copy-paste in a non compliant Unicode editor, etc. That's why it is so really used.
- d.
Regards,
Yann
On 11/18/07, Yann Forget yann@forget-me.net wrote:
Actually, the proper French apostrophe should be ' (Unicode : U2019, Code HTML : ’) not ' On the French Wikisource, we systematically replace ' with ' in all articles and titles with bots (keeping redirects). So actually, ''' should be ''' in proper French typography.
Ah, I wondered about that - the two articles I looked at both had that weird apostrophe rather than the straight one.
So, so much for this special l''''idée''' case, eh? :) (Well, not really, obviously other languages when speaking about French will probably still want to use it, and there are probably other languages in similar situations...)
The issue is that ' is not in the standard French keyboard, and it does not exist in Latin1 (like œ for oe). There is also the problem with
I didn't know that. I actually use a hybrid French/Dvorak keyboard that I invented myself, I guess it would be more authentic with the correct apostrophe then...hmm.
Steve
On 18/11/2007, Yann Forget yann@forget-me.net wrote:
Hello,
David Gerard a écrit :
http://lists.wikimedia.org/mailman/listinfo/wikitext-l
Wikitext-l was formed from a recent discussion on wikitech-l about the need to sanely reimplement the current parser, which is a Horrible Mess and pretty much impossible to reimplement in another language.
The MediaWiki parser definition is literally "whatever the PHP parser does." Some of what it does is arguably very wrong, pathological, magical or just a Stupid Parser Trick. So the list has been formed to come up with a grammar that defines all the useful parts of the present parser, and so can be used by anyone to implement a MediaWiki wikitext parser. This will be useful for other software, for WYSIWYG editing extensions ... all manner of things.
Some of what some people would think of as a "stupid parser trick" is in fact important - e.g. L'''uomo'' which renders as L<i>uomo</i> (necessary for French and Italian).
Actually, the proper French apostrophe should be ' (Unicode : U2019, Code HTML : ’) not ' On the French Wikisource, we systematically replace ' with ' in all articles and titles with bots (keeping redirects). So actually, ''' should be ''' in proper French typography.
The issue is that ' is not in the standard French keyboard, and it does not exist in Latin1 (like œ for oe). There is also the problem with broken softwares, like copy-paste in a non compliant Unicode editor, etc. That's why it is so really used.
U2019 is the correct apostrophe for English and all European languages, not just French. We use it on the English Wiktionary but it has met with great resistance on the English Wikipedia. The straight apostrophe was invented with the typewriter and dominates the computer and Internet world due to the legacy of ASCII.
I've been thinking about whether a new parser could handle all apostrophe issues at once, including converting '' to italic, ''' to bold, and ' to the correct curved apostrophe, opening or closing or closing single quote mark, and possibly even handle the case of Napolitan.
Andrew Dunbar (hippietrail)
- d.
Regards,
Yann
http://www.non-violence.org/ | Site collaboratif sur la non-violence http://www.forget-me.net/ | Alternatives sur le Net http://fr.wikisource.org/ | Bibliothèque libre http://wikilivres.info | Documents libres
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitext-l
On 18/11/2007, Andrew Dunbar hippytrail@gmail.com wrote:
I've been thinking about whether a new parser could handle all apostrophe issues at once, including converting '' to italic, ''' to bold, and ' to the correct curved apostrophe, opening or closing or closing single quote mark, and possibly even handle the case of Napolitan.
Machine-generated smart quotes are frequently problematic (I've yet to see a quote smartener that correctly handles 'cos in text), so I'd be reluctant to suggest putting that in place; but rendering the French apostrophe should be feasible, if that's actually typographically more correct in a sea of straight single quote apostrophes.
- d.
On 11/19/07, Andrew Dunbar hippytrail@gmail.com wrote:
U2019 is the correct apostrophe for English and all European languages, not just French. We use it on the English Wiktionary but it has met with great resistance on the English Wikipedia. The straight apostrophe was invented with the typewriter and dominates the computer and Internet world due to the legacy of ASCII.
I've been thinking about whether a new parser could handle all apostrophe issues at once, including converting '' to italic, ''' to bold, and ' to the correct curved apostrophe, opening or closing or closing single quote mark, and possibly even handle the case of Napolitan.
Interesting, I was midway through asking for more information then realised Wikipedia of course has a good article on the apostrophe :) (see the #Unicode section in particular)
So the main three are: U+0027: "typewriter apstrophe", the one on US keyboards. I notice that in the DOS font (or Windows command shell), it's actually curved. U+2019: "typographic apostrophe", on French keyboards? This one is always curved to the left (like a 9). According to [[Quotation mark glyphs]] this is also the correct symbol for right single quote. U+2018: left single quote (curved like a 6). Not to be confused with the dubious backquote (`)
To render single quotation marks correctly, we could (tongue in cheek) introduce a new syntactic operator, ' as follows:
'foo' -> ‘foo’ ''foo'' -> <i>foo</i> '''foo''' -> <b>foo</b> or <i>‘foo’</i> ''''foo'''' -> <b>‘foo’</b> ''''foo''' -> ’<b>foo</b> F''''oo''' -> F’<b>oo</b>
Oh joy...
It would be kind of nice to be able to have proper quotes, but can anyone think of a good mechanism for doing so?
Steve
On 19/11/2007, Steve Bennett stevagewp@gmail.com wrote:
'foo' -> ‘foo’ ''foo'' -> <i>foo</i> '''foo''' -> <b>foo</b> or <i>‘foo’</i> ''''foo'''' -> <b>‘foo’</b> ''''foo''' -> ’<b>foo</b> F''''oo''' -> F’<b>oo</b>
Don't forget:
'foo -> ’foo
- this being the case that every other smart quoting mechanism I've ever tried falls over on.
Oh joy...
Indeed! %-D
It would be kind of nice to be able to have proper quotes, but can anyone think of a good mechanism for doing so?
Potential shiny thing to be saved for later if at all possible ;-)
- d.
On 19/11/2007, David Gerard dgerard@gmail.com wrote:
On 19/11/2007, Steve Bennett stevagewp@gmail.com wrote:
'foo' -> ‘foo’ ''foo'' -> <i>foo</i> '''foo''' -> <b>foo</b> or <i>‘foo’</i> ''''foo'''' -> <b>‘foo’</b> ''''foo''' -> ’<b>foo</b> F''''oo''' -> F’<b>oo</b>
Don't forget:
'foo -> ’foo
- this being the case that every other smart quoting mechanism I've
ever tried falls over on.
Oh joy...
Indeed! %-D
It would be kind of nice to be able to have proper quotes, but can anyone think of a good mechanism for doing so?
Potential shiny thing to be saved for later if at all possible ;-)
To do smart quotes well we would need per-language rules, specific lists of exceptions per-language which don't follow the rules, and specific markup to allow editors to override the per-language rules for example when talking about one languge in a wiki of another language.
Changing plain quotes to curved quotes should not be done by the parser, but the parser could at least track which quotes are not part of '' or '''. It could also attach hints to each one that could be used by a later part of the rendering software to actually convert straight quotes to curved quotes and apostrophes.
Andrew Dunbar (hippietrail)
- d.
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitext-l
wikitext-l@lists.wikimedia.org