Re: [Wikitext-l] New parser: Kiwi

2 Feb 2011

Karl Matthias wrote:
...
  I'm one of the authors of the Kiwi parser and will
be presenting it at
 the Data Summit on Friday.  The parser is pretty complete but certainly
 we could use some community support and we encourage feedback and
 participation!  It is a highly functional tool already but it can use
 some polish.  It does actually handle most wikitext, though not
 absolutely everything.

 From your post I can see that you are experiencing a couple of design
 decisions we made in writing this parser.  We did not set out to match
 the exact HTML output of MediaWiki, only to output something that will
 look the same in the browser.  This might not be the best approach, but
 right now this is the case.  Our site doesn't have the same needs as
 Wikipedia so when in doubt we leaned toward what suited our needs and
 not necessarily ultimate tolerance of poor syntax (though it is somewhat
 flexible).  I felt bad for pointing out issues just after first try. I understand
that you have a much smaller content than wikipedia, and can use just a
subset of the markup without about corner cases.
I approach it as a tool which could work for the bigger parser, though.
Currently, it looks as just another wiki syntax, looking similar to
MediaWiki one.

...
  Another design decision is that everything that you
put in
 comes out wrapped in paragraph tags.  Usually this wraps the whole
 document, so if your whole document was just a heading, then yes it is
 wrapped in paragraph tags.  This is probably not the best way to handle
 this but it's what it currently does.  Feel free to contribute a
 different solution. 
It doesn't seem to be legal html*, so I wouldn't justify it just as a
"design decision". Same could be argued for nested <p> tags.

* opening the <hX> seems to implicitely close the previous <p>, leading
to an unmatched </p>.

...
  Templates, as you probably know, require full
integration with an
 application to work in the way that MediaWiki handles them, because they
 require access to the data store, and possibly other configuration
 information.  We built a parser that works independently of the data
 store (indeed, even on the command line in a somewhat degenerate form). 
 In order to do that, we had to decouple template retrieval from the
 parse.  If you take a look in the Ruby FFI examples, you will see a more
 elegant handling of templates(though it needs work).  When a document is
 parsed, the parser library makes available a list of templates that were
 found, the arguments passed to the template, and the unique replacement
 tag in the document for inserting the template once rendered. Those
 underscored tags that come out are not a bug, they are those unique
 tags. 
I supposed that it was somehting like that, but it was odd that it did
such conversion instead of leaving them as literals in such case.
I used just the parser binary. I have been looking at the ruby code, and
despite of the foreign language, understanding a bit more of its work.

...
  Like templates, images require some different
solutions if the parser is
 to be decoupled.  Our parser does not re-size images, store them, etc. 
 It just works with image URLs.  If your application requires images to
 be regularized, you would need to implement resizing them at upload, or
 lazily at load time, or whatever works in your scenario.  
A parser shouldn't really need to handle images. At most it would
provide a callback so that the app could do something with the image urls.

...
  More work is
 needed in this area, though if you check out http://kiwi.drasticcode.com
 you can see that most image support is working (no resizing).  You can
 also experiment with the parser there as needed. 
The url mapping used there, make some titles impossible to use, such as
making an entry for [[Edit]] - http://en.wikipedia.org/wiki/Edit

...
  Hope that at least helps explain what we've done. 
Again, feedback and
 particularly code contributions are appreciated!

 Cheers,
 Karl 
Just code lurking for now :)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] New parser: Kiwi