On Mon, Jan 31, 2011 at 4:55 PM, Trevor Parscal <tparscal(a)wikimedia.org> wrote:
Adding yet another discreet parsing step is the
reverse of what a lot of people hoping to clean up wikitext are heading towards.
What system do you propose that would retain the performance benefits
of this suggestion, and be deployable in the near future? A simple
postprocessor would be very useful -- you could save greatly on parser
cache fragmentation, if not eliminate it entirely. E.g., as Daniel
notes, you could leave a marker in the parser output where section
links should go, and have the postprocessor fill it in depending on
user language, so we don't fragment the cache for language. More
importantly, a postprocessor would allow us to add new features that
are currently unacceptable due to cache fragmentation.
What some of us have been kicking around would be
migrating away from pre-procesing the text at all. Instead the text should be parsed in a
single step into an intermediate structure that is neither wikitext nor HTML. Templates
would be required to return whole structures when expanded (open what you close, close
what you open) and would only be present in sanitary places (not in the middle of wiki or
HTML syntax for instance).
This is possibly a good long-term goal, but I don't see how it
conflicts with a postprocessing step at all. As long as parsing large
pages requires significant CPU time, we'll want to cache the parsed
output as much as possible, and a postprocessor will always help to
reduce cache fragmentation. If we ever do move to a storage format
that's so fast to process that we don't care about cache misses, of
course, we could scrap the preprocessor and incorporate its effects
into the main pass, no harm done.