On 11/15/07, Thomas Dalton <thomas.dalton(a)gmail.com> wrote:
But that wouldn't be a tree. There is no way of storing toggles in a
tree, at least not conceptually. You would end up with something like
this:
....................wikitext
......___________|__________
.....|...................|..................|
....B................text...............B
I think I would propose storing them like that in the first pass, then
replacing them with "open B" or "close B" in
the second pass, then rendering those out the obvious way.
Where "B" means a bold toggle, and "text" is arbitrary text. Things
at
the same level of a tree shouldn't depend on each
other, that's how
trees work (and is why you can use CSS to move HTML div tags anywhere
Right, and *after* that second pass, that's how the tree will work.
you like on the screen, regardless of the order they appear in the
source). Your method would probably work, but it's
just as much a mess
as my idea. And they can't be stored as bold toggle and italic
I suspect not, but I wouldn't stake my life on it. Parsing the
bold/italics at the moment is 200 lines of PHP (see doAllQuotes() and
doQuotes() ) - I think a pre-parse "tidy" is going to have to include
all that logic, I don't see how you can avoid it.
To my mind, walking a tree is much cleaner and simpler than parsing
text, and probably faster, too. But I could just be biased :)
toggles, they'll have to be stored as "x apostrophes" in order for
more complicated combinations to work. Your final walk
of the tree is
Depends how many "complicated combinations" we support.
going to end up just as complicated as my first pass through the
wikitext (it's easy to exclude the few places
where bold and italics
aren't parsed - it's just pre and nowiki as far as I know, the code
Yeah, but that does mean you're parsing pre and nowiki (and math, hiero, and
possibly others) twice.
In summary, the syntax is a complete mess, so both our solutions are
complete messes. I'm really not sure which is
better, but I don't
think there's much in it. My idea does allow for saving the tidied
version if people want (I'd prefer it to be an option, rather than
happening automatically as someone else suggested), which would be a
What would you tidy it to? At the moment, there is no unambiguous
syntax for mixed apostrophes and bold/italics, other than <nowiki>.
nice feature, but far from a vital one. It also allows for tidying
more than just bold and italics if we find anything
else that needs
similar treatment (lists, perhaps). Does your idea have any similar
added benefits?
Nope, other than it doesn't require processing the text again, and is more
akin to the model of context-free grammar we're theoretically aspiring to.
It seems cleaner to me to clearly define the exception to the EBNF this way,
but that could be a bias not based on much real evidence.
The only reason it matters at this point which solution will be used is for
the grammar. I'm currently working on the basis that ''' is just a bold
toggle which is processed and finished with. If you already know that all
the bolds and italics have been normalised, then it would be possible to
create a '''...''' openbold/closebold block.
In general I don't really like the idea of treating blocks as genuine blocks
in wikitext because of the error-handling problem. In a programming
language, if a block is malformed, you pretty much just abort the compile.
We have to carry on as sensibly as possible, so it's better not to find
yourself 500 characters into three nested blocks before suddenly discovering
a non-permissible new line.
Steve