ParserPlayground partial internals update - Wikitext-l

29 Sep 2011

I'm starting to take some more time away from helping the general MediaWiki
1.18 development and get back to the JavaScript back-end parser
experiments[1] that'll power the initial testing of the visual editor's work
on existing pages.

[1] http://www.mediawiki.org/wiki/Extension:ParserPlayground

I've made some progress[2] on in-tree template expansion, modeled on parts
of PPFrame::expand[3], Parser::argSubstitution[4] and
Parser::braceSubstitution[5] from the current MediaWiki parser.

[2] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/98393
[3]
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/parser/Prep…
[4]
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/parser/Pars…
[5]
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/parser/Pars…

There are two key differences between this in-progress code and how the
stock parser does it:

1) The stock parser expands a preprocessor node tree as it flattens it into
another string (eg for later parsing steps on the compound markup) -- the
experimental parser produces another node tree.

Later-stage parsing will deal with re-assembling start and end tokens when
producing HTML or other output, but within the tree structure instead of a
flat stream of string tokens -- that'll let output know how to mark where
pieces of output came from in the input so they can be hooked up to editing
activation.

2) Operations like fetching a template from the network are inherently
asynchronous in a JavaScript environment, so the iteration loop uses a
closure function to 'suspend' itself, calling back into the function when
the async operation completes.

Currently this is fairly naive and suspends the entire iterator, meaning
that all work gets serialized and every network round-trip will add wait
time. A smarter next step could be to keep on iterating over other nodes,
then come back to the one that was waiting when it's ready -- this could be
a win when working with many templates and high network latency. (Another
win can be to pre-fetch things you know you're going to need in bulk!)

I'm actually a bit intrigued by the idea of more aggressive asynchronous
work on the PHP-side code now; while latency is usually low within the data
center, things still add up. Working with multiple data centers for failover
and load balancing may make it more likely for Wikimedia's own sites to
experience slow data fetches, and things like InstantCommons[6] can already
require waiting for potentially slow remote data fetches on third-party
sites using shared resources over the internet.

[6] http://www.mediawiki.org/wiki/InstantCommons

Alas we can't just take JavaScript-modeled async code and port it straight
to PHP, but PHP does let you do socket i/o and select() and whatnot and
could certainly do all sorts of networked requests in background during
other CPU operations.

This is still very much in-progress code (and in-progress data structure),
but do feel free to take a peek over and give feedback. :)

-- brion vibber (brion @ wikimedia.org / brion @ pobox.com)