On Wed, Feb 2, 2011 at 3:08 PM, Platonides platonides@gmail.com wrote:
I approach it as a tool which could work for the bigger parser, though. Currently, it looks as just another wiki syntax, looking similar to MediaWiki one.
I think it is a tool that shows promise in that regard as well. With regard to "just another syntax": we can probably support all or at least most of the most important edge cases using this methodology. It will make it much uglier, but it probably can work. The question is to what lengths you go to support poorly formed markup. That answer will probably be different based on the accumulated content at various sites. Our parser isn't too tolerant right now of poorly formed markup. On our site that's ok. If people want to help us make it more tolerant we'd be interested in seeing how that turns out. I suspect it could at least double the size of the grammar based on what Ward tells me that Dirk Riehle's group found with WikiCreole. But a community effort could probably make it doable.
It doesn't seem to be legal html*, so I wouldn't justify it just as a "design decision". Same could be argued for nested <p> tags.
It's not 100% legal right now and the most egregious spot is the paragraph tags. It can be modified but doing it this way got it off the ground faster. Hence it was a design decision. But we probably will modify it to behave better in that regard. If someone wants to contribute the changes to do it, that will make it happen much faster as it's low on the list right now. Fork it on GitHub and go for it! Make the changes and submit a pull request and we'll review it. Note that MediaWiki doesn't generate 100% valid markup (but it's cleaner than ours right now!).
- opening the <hX> seems to implicitely close the previous <p>, leading
to an unmatched </p>.
I hadn't noticed this, I'll check that out. Thanks!
Templates [...snip...] I supposed that it was somehting like that, but it was odd that it did such conversion instead of leaving them as literals in such case. I used just the parser binary. I have been looking at the ruby code, and despite of the foreign language, understanding a bit more of its work.
The replacement with the hashed tag is done so that we can use a simple context-unaware string replacement on the output. If we left them in the original form we would have to know the difference between a template call inside noinclude tags and one that isn't--at render time when we have no state on the document. Given that the help info for many templates show exact calls to the template placed within noinclude tags, this would be a common bug. It's not the only possible solution but it's a simple one.
Like templates, images require some different solutions if the parser is to be decoupled.[...snip...]
A parser shouldn't really need to handle images. At most it would provide a callback so that the app could do something with the image urls.
We don't do callbacks on purpose so that we can separate the parser completely from the calling code. Our design would put the information in a place where a calling application can get to it (e.g. the list of Templates). But consider that MediaWiki actually does handle images and adds markup for height and width, etc. It makes database calls to determine "bad images", etc. This is something a separate parser can't do in the same way. A mechanism needs to be put in place to allow the calling application to do this work if it so chooses. It's fairly straightforward to do it.
More work is needed in this area, though if you check out http://kiwi.drasticcode.com you can see that most image support is working (no resizing). You can also experiment with the parser there as needed.
The url mapping used there, make some titles impossible to use, such as making an entry for [[Edit]] - http://en.wikipedia.org/wiki/Edit
You are right about that. I'm sure Sam would be happy to accept contributions to change that. The site does support double-click to edit, though, so making links to Edit is kind of unnecessary.
Just code lurking for now :)
No worries, the feedback is appreciated.
Cheers, Karl