Parser postprocessor - Wikitech-l

31 Jan 2011


      An interesting idea just popped into my head, as a combination of my 
explorations through the dom preprocessor and my attempt at deferring 
editsection replacement till after parsing is done so that skins can 
modify the markup used in an editsection link in a skin-specific way 
without breaking things, and so that we can stop fragmenting the parser 
cache by user language just for edit section links.
A postprocessor. It would be quite interesting if instead of html we 
started outputting something like this in our parser output:
<root><html>&lt;p&gt;foo&lt;/p&gt;&lt;h2&gt;</html><editsection 
page="Foo" 
section="1">bar</editsection><html>bar&lt;/h2&gt;&lt;p&gt;baz&lt;/p&gt;&lt;h2&gt;</html><choose><option><html>&lt;p&gt;foo&lt;/p&gt;</html></option><option><html>&lt;p&gt;bar&lt;/p&gt;</html></option><option><html>&lt;p&gt;baz&lt;/p&gt;</html></option></choose></root>
((don't get scarred off by all the entities, this is nothing new, try 
looking at a preprocess-xml cache entry))
Course this is a Postprocessor_DOM oriented look, like Preprocessor_Hash 
we'd have a Postprocessor_Hash and it would store a different format 
like we already do with Preprocessor_Hash (serialized?).
The idea being the creation of new markers that aren't 100% parsed but 
are outputted in a easy to deserialize format and finish parsing with 
minimal work and extensions can output and have a postprocessor hook 
expand later on. In essence the idea here is two fold.
Firstly things like the <editsection page="Foo" 
section="1">bar</editsection> I tried to introduce now is no longer a 
hack. And we can try to start deferring minimal processing cost things 
which fragment the parser cache if they aren't needed. Ideally in the 
future if something like {{int:asdf}} isn't used in a [[]] or in a 
parser function and is just a base level bit of display isolated from 
the rest of the WikiText we might be able to output it in a way that we 
don't have to fragment the cache by user lang but can still output the 
message in the user's lang by deferring it.
And as a big extra bonus, think of the RandomSelection extension. Right 
now extensions like RandomSelection end up disabling the entire parser 
cache for a page, just so they can output a random one of a series of 
options. With a postprocessor they could instead craft partially parsed 
output where all the normal wikitext is still parsed, but all the 
options given in the source text are outputted and the postprocessor 
handles the actual random selection on each page view, only outputting 
one of the three html nodes.
Likewise we might be able to implement "Welcome {{USERNAME}}!" without 
fragmenting the cache by user or having to disable it.
The key being that we get things as variable as complete randomness, at 
the level of re-executing that randomness on each page view, yet have 
barely any more processing to do than we did before. (like the rest of 
the ui that isn't part of the page content)
-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]