-----Original Message----- From: wikitext-l-bounces@lists.wikimedia.org [mailto:wikitext-l-bounces@lists.wikimedia.org] On Behalf Of Steve Bennett Sent: 27 November 2007 04:06 To: Wikitext-l Subject: [Wikitext-l] Determining the behaviour of apostrophes
I've written up an account of how the current parser treats apostrophes here: http://www.mediawiki.org/wiki/Markup_spec/BNF/Inline_text#Dete rmining_the_behaviour_of_apostrophes
All I've done is read the code of doAllQuotes() and translate it from a procedural style (first replace blah, then iterate through...) into a more declarative style (four apostrophes end up getting rendered as X if the following is the case...).
The most interesting case is this one:
Take ''''four''' apostrophes and then throw '''''five unclosed apostrophes at them.
Normally, four apostrophes is treated as apostrophe followed by bold. But when the parser finds unbalanced bold *and* italics on the line, it goes looking for a bold to split. The first bold, which is now preceded by an apostrophe, is seen as a good candidate because it seems to be a single letter followed by a bold (as in the l'''idee'' case). So that bold gets split *again*. Meaning that the four apostrophes end up getting rendered as two apostrophes followed by italics.
I suspect this was not planned behaviour.
Steve
Had an attempt to solve the bold/italics ambiguity last week. Managed to get my handwritten parser to pass the bold/italics tests in parserTests.txt (excluding anything with link markup as links parsing is still unimplemented).
The code is still missing the searching for an single-letter preceeding a bold to split at. Seems none of the tests exercise that particular bit of code. --
For Take ''''four''' apostrophes and then throw '''''five unclosed apostrophes at them.
I get <p>Take '<b>four</b> apostrophes and then throw ' five unclosed apostrophes at them.</p>
The ''''' becoming <b><i> and then getting balanced to '<i/>, with the <i/> being removed (causes problems in IE).
-- There is one test in parserTests.txt
!!test Mixing markup for italics and bold !! options !! input '''bold''''''bold''bolditalics''''' !! result <p><b>bold</b><b>bold<i>bolditalics</i></b> </p> !! End
That fails, due to having 6 apostrophes all being interpreted as markup.