On 11/15/07, Thomas Dalton thomas.dalton@gmail.com wrote:
But that wouldn't be a tree. There is no way of storing toggles in a tree, at least not conceptually. You would end up with something like this:
....................wikitext ......___________|__________ .....|...................|..................| ....B................text...............B
I think I would propose storing them like that in the first pass, then replacing them with "open B" or "close B" in the second pass, then rendering those out the obvious way.
Where "B" means a bold toggle, and "text" is arbitrary text. Things at
the same level of a tree shouldn't depend on each other, that's how trees work (and is why you can use CSS to move HTML div tags anywhere
Right, and *after* that second pass, that's how the tree will work.
you like on the screen, regardless of the order they appear in the
source). Your method would probably work, but it's just as much a mess as my idea. And they can't be stored as bold toggle and italic
I suspect not, but I wouldn't stake my life on it. Parsing the bold/italics at the moment is 200 lines of PHP (see doAllQuotes() and doQuotes() ) - I think a pre-parse "tidy" is going to have to include all that logic, I don't see how you can avoid it.
To my mind, walking a tree is much cleaner and simpler than parsing text, and probably faster, too. But I could just be biased :)
toggles, they'll have to be stored as "x apostrophes" in order for
more complicated combinations to work. Your final walk of the tree is
Depends how many "complicated combinations" we support.
going to end up just as complicated as my first pass through the
wikitext (it's easy to exclude the few places where bold and italics aren't parsed - it's just pre and nowiki as far as I know, the code
Yeah, but that does mean you're parsing pre and nowiki (and math, hiero, and possibly others) twice.
In summary, the syntax is a complete mess, so both our solutions are
complete messes. I'm really not sure which is better, but I don't think there's much in it. My idea does allow for saving the tidied version if people want (I'd prefer it to be an option, rather than happening automatically as someone else suggested), which would be a
What would you tidy it to? At the moment, there is no unambiguous syntax for mixed apostrophes and bold/italics, other than <nowiki>.
nice feature, but far from a vital one. It also allows for tidying
more than just bold and italics if we find anything else that needs similar treatment (lists, perhaps). Does your idea have any similar added benefits?
Nope, other than it doesn't require processing the text again, and is more akin to the model of context-free grammar we're theoretically aspiring to. It seems cleaner to me to clearly define the exception to the EBNF this way, but that could be a bias not based on much real evidence.
The only reason it matters at this point which solution will be used is for the grammar. I'm currently working on the basis that ''' is just a bold toggle which is processed and finished with. If you already know that all the bolds and italics have been normalised, then it would be possible to create a '''...''' openbold/closebold block.
In general I don't really like the idea of treating blocks as genuine blocks in wikitext because of the error-handling problem. In a programming language, if a block is malformed, you pretty much just abort the compile. We have to carry on as sensibly as possible, so it's better not to find yourself 500 characters into three nested blocks before suddenly discovering a non-permissible new line.
Steve