On Wed, Feb 2, 2011 at 3:08 PM, Platonides <platonides(a)gmail.com> wrote:
I approach it as a tool which could work for the
bigger parser, though.
Currently, it looks as just another wiki syntax, looking similar to
MediaWiki one.
I think it is a tool that shows promise in that regard as well. With
regard to "just another syntax": we can probably support all or at
least most of the most important edge cases using this methodology.
It will make it much uglier, but it probably can work. The question
is to what lengths you go to support poorly formed markup. That
answer will probably be different based on the accumulated content at
various sites. Our parser isn't too tolerant right now of poorly
formed markup. On our site that's ok. If people want to help us make
it more tolerant we'd be interested in seeing how that turns out. I
suspect it could at least double the size of the grammar based on what
Ward tells me that Dirk Riehle's group found with WikiCreole. But a
community effort could probably make it doable.
It doesn't seem to be legal html*, so I
wouldn't justify it just as a
"design decision". Same could be argued for nested <p> tags.
It's not 100% legal right now and the most egregious spot is the
paragraph tags. It can be modified but doing it this way got it off
the ground faster. Hence it was a design decision. But we probably
will modify it to behave better in that regard. If someone wants to
contribute the changes to do it, that will make it happen much faster
as it's low on the list right now. Fork it on GitHub and go for it!
Make the changes and submit a pull request and we'll review it. Note
that MediaWiki doesn't generate 100% valid markup (but it's cleaner
than ours right now!).
* opening the <hX> seems to implicitely close
the previous <p>, leading
to an unmatched </p>.
I hadn't noticed this, I'll check that out. Thanks!
Templates [...snip...]
I supposed that it was somehting like that, but it was odd that it did
such conversion instead of leaving them as literals in such case.
I used just the parser binary. I have been looking at the ruby code, and
despite of the foreign language, understanding a bit more of its work.
The replacement with the hashed tag is done so that we can use a
simple context-unaware string replacement on the output. If we left
them in the original form we would have to know the difference between
a template call inside noinclude tags and one that isn't--at render
time when we have no state on the document. Given that the help info
for many templates show exact calls to the template placed within
noinclude tags, this would be a common bug. It's not the only
possible solution but it's a simple one.
Like
templates, images require some different solutions if the parser is
to be decoupled.[...snip...]
A parser shouldn't really need to handle images. At most it would
provide a callback so that the app could do something with the image urls.
We don't do callbacks on purpose so that we can separate the parser
completely from the calling code. Our design would put the
information in a place where a calling application can get to it (e.g.
the list of Templates). But consider that MediaWiki actually does
handle images and adds markup for height and width, etc. It makes
database calls to determine "bad images", etc. This is something a
separate parser can't do in the same way. A mechanism needs to be put
in place to allow the calling application to do this work if it so
chooses. It's fairly straightforward to do it.
More work is
needed in this area, though if you check out
http://kiwi.drasticcode.com
you can see that most image support is working (no resizing). You can
also experiment with the parser there as needed.
The url mapping used there, make some titles impossible to use, such as
making an entry for [[Edit]] -
http://en.wikipedia.org/wiki/Edit
You are right about that. I'm sure Sam would be happy to accept
contributions to change that. The site does support double-click to
edit, though, so making links to Edit is kind of unnecessary.
Just code lurking for now :)
No worries, the feedback is appreciated.
Cheers,
Karl