[Wikitext-l] New parser: Kiwi
Platonides
platonides at gmail.com
Wed Feb 2 23:08:30 UTC 2011
Karl Matthias wrote:
> I'm one of the authors of the Kiwi parser and will be presenting it at
> the Data Summit on Friday. The parser is pretty complete but certainly
> we could use some community support and we encourage feedback and
> participation! It is a highly functional tool already but it can use
> some polish. It does actually handle most wikitext, though not
> absolutely everything.
>
> From your post I can see that you are experiencing a couple of design
> decisions we made in writing this parser. We did not set out to match
> the exact HTML output of MediaWiki, only to output something that will
> look the same in the browser. This might not be the best approach, but
> right now this is the case. Our site doesn't have the same needs as
> Wikipedia so when in doubt we leaned toward what suited our needs and
> not necessarily ultimate tolerance of poor syntax (though it is somewhat
> flexible).
I felt bad for pointing out issues just after first try. I understand
that you have a much smaller content than wikipedia, and can use just a
subset of the markup without about corner cases.
I approach it as a tool which could work for the bigger parser, though.
Currently, it looks as just another wiki syntax, looking similar to
MediaWiki one.
> Another design decision is that everything that you put in
> comes out wrapped in paragraph tags. Usually this wraps the whole
> document, so if your whole document was just a heading, then yes it is
> wrapped in paragraph tags. This is probably not the best way to handle
> this but it's what it currently does. Feel free to contribute a
> different solution.
It doesn't seem to be legal html*, so I wouldn't justify it just as a
"design decision". Same could be argued for nested <p> tags.
* opening the <hX> seems to implicitely close the previous <p>, leading
to an unmatched </p>.
> Templates, as you probably know, require full integration with an
> application to work in the way that MediaWiki handles them, because they
> require access to the data store, and possibly other configuration
> information. We built a parser that works independently of the data
> store (indeed, even on the command line in a somewhat degenerate form).
> In order to do that, we had to decouple template retrieval from the
> parse. If you take a look in the Ruby FFI examples, you will see a more
> elegant handling of templates(though it needs work). When a document is
> parsed, the parser library makes available a list of templates that were
> found, the arguments passed to the template, and the unique replacement
> tag in the document for inserting the template once rendered. Those
> underscored tags that come out are not a bug, they are those unique
> tags.
I supposed that it was somehting like that, but it was odd that it did
such conversion instead of leaving them as literals in such case.
I used just the parser binary. I have been looking at the ruby code, and
despite of the foreign language, understanding a bit more of its work.
> Like templates, images require some different solutions if the parser is
> to be decoupled. Our parser does not re-size images, store them, etc.
> It just works with image URLs. If your application requires images to
> be regularized, you would need to implement resizing them at upload, or
> lazily at load time, or whatever works in your scenario.
A parser shouldn't really need to handle images. At most it would
provide a callback so that the app could do something with the image urls.
> More work is
> needed in this area, though if you check out http://kiwi.drasticcode.com
> you can see that most image support is working (no resizing). You can
> also experiment with the parser there as needed.
The url mapping used there, make some titles impossible to use, such as
making an entry for [[Edit]] - http://en.wikipedia.org/wiki/Edit
> Hope that at least helps explain what we've done. Again, feedback and
> particularly code contributions are appreciated!
>
> Cheers,
> Karl
Just code lurking for now :)
More information about the Wikitext-l
mailing list