I've written up an account of how the current parser treats apostrophes here: http://www.mediawiki.org/wiki/Markup_spec/BNF/Inline_text#Determining_the_be...
All I've done is read the code of doAllQuotes() and translate it from a procedural style (first replace blah, then iterate through...) into a more declarative style (four apostrophes end up getting rendered as X if the following is the case...).
The most interesting case is this one:
Take ''''four''' apostrophes and then throw '''''five unclosed apostrophes at them.
Normally, four apostrophes is treated as apostrophe followed by bold. But when the parser finds unbalanced bold *and* italics on the line, it goes looking for a bold to split. The first bold, which is now preceded by an apostrophe, is seen as a good candidate because it seems to be a single letter followed by a bold (as in the l'''idee'' case). So that bold gets split *again*. Meaning that the four apostrophes end up getting rendered as two apostrophes followed by italics.
I suspect this was not planned behaviour.
Steve
-----Original Message----- From: wikitext-l-bounces@lists.wikimedia.org [mailto:wikitext-l-bounces@lists.wikimedia.org] On Behalf Of Steve Bennett Sent: 27 November 2007 04:06 To: Wikitext-l Subject: [Wikitext-l] Determining the behaviour of apostrophes
I've written up an account of how the current parser treats apostrophes here: http://www.mediawiki.org/wiki/Markup_spec/BNF/Inline_text#Dete rmining_the_behaviour_of_apostrophes
All I've done is read the code of doAllQuotes() and translate it from a procedural style (first replace blah, then iterate through...) into a more declarative style (four apostrophes end up getting rendered as X if the following is the case...).
The most interesting case is this one:
Take ''''four''' apostrophes and then throw '''''five unclosed apostrophes at them.
Normally, four apostrophes is treated as apostrophe followed by bold. But when the parser finds unbalanced bold *and* italics on the line, it goes looking for a bold to split. The first bold, which is now preceded by an apostrophe, is seen as a good candidate because it seems to be a single letter followed by a bold (as in the l'''idee'' case). So that bold gets split *again*. Meaning that the four apostrophes end up getting rendered as two apostrophes followed by italics.
I suspect this was not planned behaviour.
Steve
Had an attempt to solve the bold/italics ambiguity last week. Managed to get my handwritten parser to pass the bold/italics tests in parserTests.txt (excluding anything with link markup as links parsing is still unimplemented).
The code is still missing the searching for an single-letter preceeding a bold to split at. Seems none of the tests exercise that particular bit of code. --
For Take ''''four''' apostrophes and then throw '''''five unclosed apostrophes at them.
I get <p>Take '<b>four</b> apostrophes and then throw ' five unclosed apostrophes at them.</p>
The ''''' becoming <b><i> and then getting balanced to '<i/>, with the <i/> being removed (causes problems in IE).
-- There is one test in parserTests.txt
!!test Mixing markup for italics and bold !! options !! input '''bold''''''bold''bolditalics''''' !! result <p><b>bold</b><b>bold<i>bolditalics</i></b> </p> !! End
That fails, due to having 6 apostrophes all being interpreted as markup.
On 11/28/07, Jared Williams jared.williams1@ntlworld.com wrote:
The code is still missing the searching for an single-letter preceeding a bold to split at. Seems none of the tests exercise that particular bit of code.
That's a relief. Now that I understand this rule, I think it's a complete load of bollocks, and should be removed from any notion of "correct" treatement of wikitext. Mismatched apostrophe groupings should be considered erroneous input whose rendering is undefined.
Why?
For starters, as discussed, the French wikipedia doesn't even use this construct. Worse, it only works *once* per paragraph. Look at how this renders:
* L'''amour'' is great the first time. But l'''amour'' fails the second time.
You guessed it, bold from the first ''' to the second ''', and italics from the first '' to the second ''. And why would it be any different?
The treatment of 4 apostrophes is much less offensive. This renders correctly: * L''''amour''' is bold the first time. And l''''amour''' is still bold the second time.
The 4 apostrophes -> apostrophe, bold rule is at least consistent, though it's still not intuitive that this: ''''blah'''' put the first apostrophe in normal text, while the second one is bold. Hard to believe the user really wants that...
Of course, the only time 4 apostrophes ever renders as anything *other* than apostrophe followed by bold is when the crazy rule above is invoked, turning it into two apostrophes followed by italics.
Steve (rambly late at night)
That's a relief. Now that I understand this rule, I think it's a complete load of bollocks, and should be removed from any notion of "correct" treatement of wikitext. Mismatched apostrophe groupings should be considered erroneous input whose rendering is undefined.
I agree, the rule should probably be removed as an ugly hack that doesn't really work anyway. However, we need to define some kind of rendering for mismatched apostrophes, since there seems to be a consensus that refusing to save "invalid" wikitext is a very bad idea. Is it best to render them all as literal apostrophes, perhaps?
-----Original Message----- From: wikitext-l-bounces@lists.wikimedia.org [mailto:wikitext-l-bounces@lists.wikimedia.org] On Behalf Of Steve Bennett Sent: 27 November 2007 15:05 To: Wikitext-l Subject: Re: [Wikitext-l] Determining the behaviour of apostrophes
On 11/28/07, Jared Williams jared.williams1@ntlworld.com wrote:
The code is still missing the searching for an single-letter preceeding a bold to split at. Seems none of the tests
exercise that
particular bit of code.
That's a relief. Now that I understand this rule, I think it's a complete load of bollocks, and should be removed from any notion of "correct" treatement of wikitext. Mismatched apostrophe groupings should be considered erroneous input whose rendering is undefined.
Yeah. It does seem a lot more trouble that its worth.
For instance introducing an escape character, (say \ for example) that guarentees the following character is text.
* L'''amour'' is great the first time. But l'''amour'' fails the second time.
Jared
For instance introducing an escape character, (say \ for example) that guarentees the following character is text.
Escape characters can be confusing for users not used to them (which is most), because you end up having to escape the escape character when trying to use it literally.
-----Original Message----- From: wikitext-l-bounces@lists.wikimedia.org [mailto:wikitext-l-bounces@lists.wikimedia.org] On Behalf Of Thomas Dalton Sent: 27 November 2007 16:39 To: Wikitext-l Subject: Re: [Wikitext-l] Determining the behaviour of apostrophes
For instance introducing an escape character, (say \ for
example) that
guarentees the following character is text.
Escape characters can be confusing for users not used to them (which is most), because you end up having to escape the escape character when trying to use it literally.
Was just off the top of the head idea, but still think its easier than explaining why
* L'''amour'' is great the first time. But l'''amour'' fails the second time.
Won't work as expected.
Jared
wikitext-l@lists.wikimedia.org