On 11/28/07, David Gerard dgerard@gmail.com wrote:
Steve Bennett has been writing a parser grammar, and investigating how the present parser *actually* works.
Turns out the apostrophe-italic combination only works once a para. Is this expected?
To clarify, this behaviour (converting exactly one occurrence of three apostrophes to apostrophe+italics if the paragraph as a whole has mismatched italics/bold) is pretty evident from looking at the code:
# If there is a single-letter word, use it! if ($firstsingleletterword > -1) { $arr [ $firstsingleletterword ] = "''"; $arr [ $firstsingleletterword-1 ] .= "'"; }
So, the writer of this code (Magnus?) definitely knows about this limitation. The question is really:
1) Does anyone really use this construct? We've heard that the French use a curved apostrophe instead of the straight one in this situation. It's hard to believe anyone relies on it as it's so flaky: once per paragraph only? Eep. 2) Can it either be removed from the current parser or not implemented in the spac/future parser?
It's particularly noxious as there is no way to parse it in any reasonable fashion. Four apostrophes is always apostrophe+bold (parseable), except that this rule means that if at the end of the paragraph you encounter other unclosed italics and bold, you have to go back to the start and convert one of these new "apostrophe+bold" sequences into "apostrophe+apostrophe+italics" (nightmare).
I should also point out that whenever this situation (bold and italics both unbalanced) arises, the parser always attempts to recover by converting a bold into an italics, not just if there is a single letter word - that's just the one it splits first.
Steve (not subscribed to foundation-l)