New subject: Determining the behaviour of apostrophes

30 Nov 2007


      ...
Earlier: "... Whether or not you think it's
 a waste of time, there's no excuse for 
broadcasting every parser bug you find 
to three mailing lists.  There's no 
shortage of parser bugs, and no need 
to act surprised when you find one ... 
If we want to talk about the parser 
grammar effort, we all know which list 
to subscribe to ...
Peter Blaise responds: Oh?  Which one?  I do not know, and you do not
mention it in your post, so, help me out here, please - which list?  If
you're gonna type something, why not make it unambiguously accurate and
complete, anyway?  Otherwise, what's the point?
Additionally, I personally find cross posting very important.  Of course
anyone NOT interested can just scroll on or delete - there's no such
thing as too much information in my book (on topic - I'm not talking
about spam or off topic posts).  Parser behavior = wiki tech in my book.
I very often I find spirited discussions ensue because of cross-posted
ideas - it tends to freshen otherwise stale meeting places.
More to the point here, wiki markup parser wise, my point is twofold:
One, I would prefer NOT to have my wiki end users get error messages
when they edit.  I'd prefer that any editing just go in and be saved,
and later we'll deal with formatting surprises.  I'm a firm believer in
separating the tasks of content creation and content presentation.
Someone adding content to a wiki should never be delayed by presentation
formatting error messages.  Let the text land however it lands, clean it
up later.
Two, we tend to discover how things work in spite of erroneous,
presumptive, naive instructions.  I already have begun to discard the
"rule" that bold happened between three apostrophes.  Instead, I've
discovered a hierarchy of toggles.  Three apostrophes toggle bold to the
other state.  Two apostrophes toggle italics to the other state.  The
parser makes it decisions on how to interpret duplicate punctuation at
the END of any code that matches it's look-up-table, or at the first
"word barrier" transition.  Or does it?  Cut and paste this into any
sandbox page and explore:
'1text = apostrophe one text; no duplicate punctuation, no wiki markup.
''2text = italics two text; duplicate punctuation matched wiki markup,
and the parser toggles the state of the matching function, here,
italics.
'''3text = bold three text; duplicate punctuation matched wiki markup,
and the parser toggles the state of the matching function, here, bold.
''''4text = bold apostrophe four text; duplicate punctuation matched
wiki markup, and the parser toggles the state of the matching function,
here, bold (3 apostrophes) was the superior interpretable state before
the 4th apostrophe, so bold toggles (on or off), and the final
apostrophe is interpreted as mere punctuation or text.  Alternatively,
four apostrophes could be considered as two toggles of the italics
function.  But, since the first word barrier occurs only after the 4th
apostrophe, and there is no text between the apostrophes, that
interpretation would not have any visible effect on the display.  It
probably makes sense to have the parser continue interpreting up to the
third apostrophe before making a decision, and consider it a call for a
bold-toggle, rather than consider the first two apostrophes as an
italics-toggle, and then start looking for a subsequent wiki markup
instruction.  Otherwise, the parser would never find bold (3
apostrophes) if it always gave precedent to interpreting the first 2
apostrophes as italics.  The parser seems to reads left to right, and
interprets according to (what we hope are ) discoverable hierarchies:
word transitions, or, matching duplicate punctuation to wiki markup
code, whichever it finds first, such as knowing that three apostrophes
toggles bold.
'''''5text = italics bold five text; which toggled first?  Who knows?  I
presume bold toggles first, then italics toggled.  Let's test:
'''bold '''''5text = italics five text (no bold); implying the five
apostrophes were interpreted bold as highest in the hierarchy, so, of
the five apostrophes, the first three were considered a bold-toggle, and
final two were considered an italics-toggle.
''italics '''''5text = bold five text (no italics); implying bold again
wins, and the subsequent italics-toggle turns off italics as expected,
the pattern is, so far, predictable.  But let's revisit four
apostrophes:
'''bold ''''4text = bold apostrophe normal four text, this makes no
sense.  In the four apostrophe grouping, the first three should have
toggled bold, and the final should have been interpreted as text,
displaying '4text normal, but when actually displayed, the ' was bold
and the 4text was normal.  Huh?  THERE'S THE BUG!
''italics ''''4text = normal two apostrophe four text, again, in the
four apostrophe group, the first three should have been a bold toggle
and the subsequent apostrophe should have been text.  Apparently the
parser holds the existing state of wiki markup toggle in it's head and
raises that in the hierarchy.  Who does this programming, anyway?  Let's
test italics on and off first, not just on first:
''real italics'' ''''4text = normal apostrophe, bold four text.  This
SHOULD be the same as avobe, but isn't.  Apparently we need to add one
more item to our expected parser function hierarchy, THIS IS WHAT THE
PARSER SEEMS TO ASK:
1 - is there a bold or italics toggle ON outstanding? (This surprised
me, I thought toggles ON and OFF were hierarchically equivalent, but
apparently a toggle ON creates a pressing need to look for a toggle OFF
before interpreting anything else!)
Then:
2 - have we reached the superior matching wiki markup text? (On other
words, ''' is superior to '' in the look-up-table.)
3 - have we reached a text word barrier or paragraph barrier?
(Supposedly, paragraph markers reset all toggles to OFF, but apparently
some wiki markup survives paragraph markers, or is it only HTML-style
markup using <markup></markup>-style coding that ignores paragraph
markers?)
Continuing the test:
''''''6text = italics bold apostrophe six text
'''''''7text = italics bold 2 apostrophe seven text
''''''''8text = italics bold 3 apostrophe eight text
... and so on.

Re: [Wikitech-l] Determining the behaviour of apostrophes