FYI: *we* are seeing your entire message, on-list
-- j
----- Original Message -----
From: "Karl Matthias"
<karl(a)matthias.org>
To: wikitext-l(a)lists.wikimedia.org
Sent: Tuesday, February 1, 2011 7:48:30 PM
Subject: Re: [Wikitext-l] New parser: Kiwi
Apologies... even the second attempt was truncated it seems. Here's
one final try
Karl
-----------
Alan Post wrote:
Interesting. Is the PEG grammar available for
this parser?
-Alan
It's at
https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg
Get peg/leg from
http://piumarta.com/software/peg/
I just tried it and already found a bug on the first Hello World (it
surrounds headers inside paragraphs).
It strangely converts templates into underscored words. They may be
expecting some other parser piece to restore it. I'm pretty sure there
are corner cases in the preprocessor (eg. just looking at the peg file
they don't handle mixed case noincludes), but I don't think that
should
need to be handled by the parser itself.
The grammar looks elegant. I doubt it can really handle full wikitext.
But it would be so nice if it did...
I'm one of the authors of the Kiwi parser and will be presenting it at
the Data Summit on Friday. The parser is pretty complete but
certainly we could use some community support and we encourage
feedback and participation! It is a highly functional tool already
but it can use some polish. It does actually handle most wikitext,
though not absolutely everything.
From your post I can see that you are experiencing a couple of design
decisions we made in writing this parser. We did not set out to match
the exact HTML output of MediaWiki, only to output something that will
look the same in the browser. This might not be the best approach,
but right now this is the case. Our site doesn't have the same needs
as Wikipedia so when in doubt we leaned toward what suited our needs
and not necessarily ultimate tolerance of poor syntax (though it is
somewhat flexible). Another design decision is that everything that
you put in comes out wrapped in paragraph tags. Usually this wraps
the whole document, so if your whole document was just a heading, then
yes it is wrapped in paragraph tags. This is probably not the best
way to handle this but it's what it currently does. Feel free to
contribute a different solution.
Templates, as you probably know, require full integration with an
application to work in the way that MediaWiki handles them, because
they require access to the data store, and possibly other
configuration information. We built a parser that works independently
of the data store (indeed, even on the command line in a somewhat
degenerate form). In order to do that, we had to decouple template
retrieval from the parse. If you take a look in the Ruby FFI
examples, you will see a more elegant handling of templates(though it
needs work). When a document is parsed, the parser library makes
available a list of templates that were found, the arguments passed to
the template, and the unique replacement tag in the document for
inserting the template once rendered. Those underscored tags that come
out are not a bug, they are those unique tags. There is a switch to
disable templates and in that case it just swallows them instead. So
the template handling work flow (simplistically) is:
1. Parse original document and generate list of templates,
arguments, replacement tags
2. Fetch first template, if there is no recursion needed, insert
into original document
3. Fetch next template, etc
We currently recurse 6 templates deep in the bindings we built for
AboutUs.org (sysop-only at the moment). Template arguments don't work
right now, but it's fairly trivial to do it. We just haven't done it
yet.
Like templates, images require some different solutions if the parser
is to be decoupled. Our parser does not re-size images, store them,
etc. It just works with image URLs. If your application requires
images to be regularized, you would need to implement resizing them at
upload, or lazily at load time, or whatever works in your scenario.
More work is needed in this area, though if you check out
http://kiwi.drasticcode.com you can see that most image support is
working (no resizing). You can also experiment with the parser there
as needed.
Hope that at least helps explain what we've done. Again, feedback and
particularly code contributions are appreciated!
Cheers,
Karl
_______________________________________________
Wikitext-l mailing list
Wikitext-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l