Re: [Wikitext-l] New parser: Kiwi

3 Feb 2011

FYI: *we* are seeing your entire message, on-list
-- j

----- Original Message -----
...
  From: "Karl Matthias"
&lt;karl(a)matthias.org&gt;
 To: wikitext-l(a)lists.wikimedia.org
 Sent: Tuesday, February 1, 2011 7:48:30 PM
 Subject: Re: [Wikitext-l] New parser: Kiwi
 Apologies... even the second attempt was truncated it seems. Here's
 one final try

 Karl
 -----------
 Alan Post wrote:
  Interesting. Is the PEG grammar available for
this parser?  

 -Alan  
 It's at https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg

 Get peg/leg from http://piumarta.com/software/peg/

 I just tried it and already found a bug on the first Hello World (it
 surrounds headers inside paragraphs).
 It strangely converts templates into underscored words. They may be
 expecting some other parser piece to restore it. I'm pretty sure there

 are corner cases in the preprocessor (eg. just looking at the peg file
 they don't handle mixed case noincludes), but I don't think that
 should
 need to be handled by the parser itself.

 The grammar looks elegant. I doubt it can really handle full wikitext.

 But it would be so nice if it did...

 I'm one of the authors of the Kiwi parser and will be presenting it at
 the Data Summit on Friday. The parser is pretty complete but
 certainly we could use some community support and we encourage
 feedback and participation! It is a highly functional tool already
 but it can use some polish. It does actually handle most wikitext,
 though not absolutely everything.

 From your post I can see that you are experiencing a couple of design
 decisions we made in writing this parser. We did not set out to match
 the exact HTML output of MediaWiki, only to output something that will
 look the same in the browser. This might not be the best approach,
 but right now this is the case. Our site doesn't have the same needs
 as Wikipedia so when in doubt we leaned toward what suited our needs
 and not necessarily ultimate tolerance of poor syntax (though it is
 somewhat flexible). Another design decision is that everything that
 you put in comes out wrapped in paragraph tags. Usually this wraps
 the whole document, so if your whole document was just a heading, then
 yes it is wrapped in paragraph tags. This is probably not the best
 way to handle this but it's what it currently does. Feel free to
 contribute a different solution.

 Templates, as you probably know, require full integration with an
 application to work in the way that MediaWiki handles them, because
 they require access to the data store, and possibly other
 configuration information. We built a parser that works independently
 of the data store (indeed, even on the command line in a somewhat
 degenerate form). In order to do that, we had to decouple template
 retrieval from the parse. If you take a look in the Ruby FFI
 examples, you will see a more elegant handling of templates(though it
 needs work). When a document is parsed, the parser library makes
 available a list of templates that were found, the arguments passed to
 the template, and the unique replacement tag in the document for
 inserting the template once rendered. Those underscored tags that come
 out are not a bug, they are those unique tags. There is a switch to
 disable templates and in that case it just swallows them instead. So
 the template handling work flow (simplistically) is:

 1. Parse original document and generate list of templates,
 arguments, replacement tags
 2. Fetch first template, if there is no recursion needed, insert
 into original document
 3. Fetch next template, etc

 We currently recurse 6 templates deep in the bindings we built for
 AboutUs.org (sysop-only at the moment). Template arguments don't work
 right now, but it's fairly trivial to do it. We just haven't done it
 yet.

 Like templates, images require some different solutions if the parser
 is to be decoupled. Our parser does not re-size images, store them,
 etc. It just works with image URLs. If your application requires
 images to be regularized, you would need to implement resizing them at
 upload, or lazily at load time, or whatever works in your scenario.
 More work is needed in this area, though if you check out
 http://kiwi.drasticcode.com you can see that most image support is
 working (no resizing). You can also experiment with the parser there
 as needed.

 Hope that at least helps explain what we've done. Again, feedback and
 particularly code contributions are appreciated!

 Cheers,
 Karl

 _______________________________________________
 Wikitext-l mailing list
 Wikitext-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitext-l 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] New parser: Kiwi