You're right that the routes on kiwi.drasticcode.com leave something
to be desired. I was mainly focused on getting a demo of the parser
working, and didn't put a lot of thought into the urls, or try to
follow any wiki best practices. I did try to avoid some of the
MediaWiki conventions, like putting colons in routes, or indicating
the action with a GET param. I've found these can be tricky to
duplicate in other frameworks like Rails, which doesn't easily support
I would like to find the time to address this (although Karl's right
that I would welcome contributions.) I'm considering that a routing
scheme like this might work:
POST /wiki/another/page # update or create
Are there any obvious problems with this approach I might want to consider?
On Thu, Feb 3, 2011 at 12:41 AM, Platonides <platonides at gmail.com> wrote:
> Karl Matthias wrote:
>>> The url mapping used there, make some titles impossible to use, such as
>>> making an entry for [[Edit]] - http://en.wikipedia.org/wiki/Edit
>> You are right about that. I'm sure Sam would be happy to accept
>> contributions to change that. The site does support double-click to
>> edit, though, so making links to Edit is kind of unnecessary.
> It's not just edit, but all actions, such as upload. The real solution
> is to have the wiki items inside a "folder" and the actions outside. You
> could prefix actions, like mediawiki does (eg. Action:Edit, and
> forbidding pages starting with Action:) but you would still have the
> classic problems for root folder items such as favicon.ico.
> Alan Post wrote:
> >* Interesting. Is the PEG grammar available for this parser?
> *>* -Alan
> It's at https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg
> Get peg/leg from http://piumarta.com/software/peg/
> I just tried it and already found a bug on the first Hello World (it
> surrounds headers inside paragraphs).
> It strangely converts templates into underscored words. They may be
> expecting some other parser piece to restore it. I'm pretty sure there
> are corner cases in the preprocessor (eg. just looking at the peg file
> they don't handle mixed case noincludes), but I don't think that should
> need to be handled by the parser itself.
> The grammar looks elegant. I doubt it can really handle full wikitext.
> But it would be so nice if it did...
I'm one of the authors of the Kiwi parser and will be presenting it at the
Data Summit on Friday. The parser is pretty complete but certainly we could
use some community support and we encourage feedback and participation! It
is a highly functional tool already but it can use some polish. It does
actually handle most wikitext, though not absolutely everything.
>From your post I can see that you are experiencing a couple of design
decisions we made in writing this parser. We did not set out to match the
exact HTML output of MediaWiki, only to output something that will look the
same in the browser. This might not be the best approach, but right now
this is the case. Our site doesn't have the same needs as Wikipedia so when
in doubt we leaned toward what suited our needs and not necessarily ultimate
tolerance of poor syntax (though it is somewhat flexible). Another design
decision is that everything that you put in comes out wrapped in paragraph
tags. Usually this wraps the whole document, so if your whole document was
just a heading, then yes it is wrapped in paragraph tags. This is probably
not the best way to handle this but it's what it currently does. Feel free
to contribute a different solution.
Templates, as you probably know, require full integration with an
application to work in the way that MediaWiki handles them, because they
require access to the data store, and possibly other configuration
information. We built a parser that works independently of the data store
(indeed, even on the command line in a somewhat degenerate form). In order
to do that, we had to decouple template retrieval from the parse. If you
take a look in the Ruby FFI examples, you will see a more elegant handling
of templates(though it needs work). When a document is parsed, the parser
library makes available a list of templates that were found, the arguments
passed to the template, and the unique replacement tag in the document for
inserting the template once rendered. Those underscored tags that come out
are not a bug, they are those unique tags. There is a switch to disable
templates and in that case it just swallows them instead. So the template
handling work flow (simplistically) is:
1. Parse original document and generate list of templates, arguments,
2. Fetch first template, if there is no recursion needed, insert into
3. Fetch next template, etc
We currently recurse 6 templates deep in the bindings we built for
AboutUs.org (sysop-only at the moment). Template arguments don't work right
now, but it's fairly trivial to do it. We just haven't done it yet.
Like templates, images require some different solutions if the parser is to
be decoupled. Our parser does not re-size images, store them, etc. It just
works with image URLs. If your application requires images to be
regularized, you would need to implement resizing them at upload, or lazily
at load time, or whatever works in your scenario. More work is needed in
this area, though if you check out http://kiwi.drasticcode.com you can see
that most image support is working (no resizing). You can also experiment
with the parser there as needed.
Hope that at least helps explain what we've done. Again, feedback and
particularly code contributions are appreciated!
Hi everyone. I just joined the list; this is my first post, but I've been
following the developments with interest.
I just learned about another promising MW parser, from the AboutUs guys, and
thought I'd share it here in the list:
Kiwi: A Fast, Formal WikiText Parser -