[Wikitext-l] New parser: Kiwi

Wed Feb 2 07:19:24 UTC 2011

2011-02-02 01:48, Karl Matthias skrev:
> Apologies... even the second attempt was truncated it seems.  Here's
> one final try

You are hit by the same problem I was a few days ago on this list.  You
have a line that starts with "From your" in the text.

/Andreas

> Karl
> -----------
>     Alan Post wrote:
>     > Interesting.  Is the PEG grammar available for this parser?
> 
>     >
>     > -Alan
> 
>     It's at https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg
> 
>     Get peg/leg from http://piumarta.com/software/peg/
> 
> 
>     I just tried it and already found a bug on the first Hello World (it
>     surrounds headers inside paragraphs).
>     It strangely converts templates into underscored words. They may be
>     expecting some other parser piece to restore it. I'm pretty sure there
> 
>     are corner cases in the preprocessor (eg. just looking at the peg file
>     they don't handle mixed case noincludes), but I don't think that should
>     need to be handled by the parser itself.
> 
>     The grammar looks elegant. I doubt it can really handle full wikitext.
> 
>     But it would be so nice if it did...
> 
> 
> I'm one of the authors of the Kiwi parser and will be presenting it at
> the Data Summit on Friday.  The parser is pretty complete but
> certainly we could use some community support and we encourage
> feedback and participation!  It is a highly functional tool already
> but it can use some polish.  It does actually handle most wikitext,
> though not absolutely everything.
> 
>>From your post I can see that you are experiencing a couple of design
> decisions we made in writing this parser.  We did not set out to match
> the exact HTML output of MediaWiki, only to output something that will
> look the same in the browser.  This might not be the best approach,
> but right now this is the case.  Our site doesn't have the same needs
> as Wikipedia so when in doubt we leaned toward what suited our needs
> and not necessarily ultimate tolerance of poor syntax (though it is
> somewhat flexible).  Another design decision is that everything that
> you put in comes out wrapped in paragraph tags.  Usually this wraps
> the whole document, so if your whole document was just a heading, then
> yes it is wrapped in paragraph tags.  This is probably not the best
> way to handle this but it's what it currently does.  Feel free to
> contribute a different solution.
> 
> Templates, as you probably know, require full integration with an
> application to work in the way that MediaWiki handles them, because
> they require access to the data store, and possibly other
> configuration information.  We built a parser that works independently
> of the data store (indeed, even on the command line in a somewhat
> degenerate form).  In order to do that, we had to decouple template
> retrieval from the parse.  If you take a look in the Ruby FFI
> examples, you will see a more elegant handling of templates(though it
> needs work).  When a document is parsed, the parser library makes
> available a list of templates that were found, the arguments passed to
> the template, and the unique replacement tag in the document for
> inserting the template once rendered. Those underscored tags that come
> out are not a bug, they are those unique tags.  There is a switch to
> disable templates and in that case it just swallows them instead.  So
> the template handling work flow (simplistically) is:
> 
>    1. Parse original document and generate list of templates,
> arguments, replacement tags
>    2. Fetch first template, if there is no recursion needed, insert
> into original document
>    3. Fetch next template, etc
> 
> We currently recurse 6 templates deep in the bindings we built for
> AboutUs.org (sysop-only at the moment).  Template arguments don't work
> right now, but it's fairly trivial to do it.  We just haven't done it
> yet.
> 
> Like templates, images require some different solutions if the parser
> is to be decoupled.  Our parser does not re-size images, store them,
> etc.  It just works with image URLs.  If your application requires
> images to be regularized, you would need to implement resizing them at
> upload, or lazily at load time, or whatever works in your scenario.
> More work is needed in this area, though if you check out
> http://kiwi.drasticcode.com you can see that most image support is
> working (no resizing).  You can also experiment with the parser there
> as needed.
> 
> Hope that at least helps explain what we've done.  Again, feedback and
> particularly code contributions are appreciated!
> 
> Cheers,
> Karl
> 
> _______________________________________________
> Wikitext-l mailing list
> Wikitext-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>