On Mon, Jan 31, 2011 at 4:55 PM, Trevor Parscal tparscal@wikimedia.org wrote:
Adding yet another discreet parsing step is the reverse of what a lot of people hoping to clean up wikitext are heading towards.
What system do you propose that would retain the performance benefits of this suggestion, and be deployable in the near future? A simple postprocessor would be very useful -- you could save greatly on parser cache fragmentation, if not eliminate it entirely. E.g., as Daniel notes, you could leave a marker in the parser output where section links should go, and have the postprocessor fill it in depending on user language, so we don't fragment the cache for language. More importantly, a postprocessor would allow us to add new features that are currently unacceptable due to cache fragmentation.
What some of us have been kicking around would be migrating away from pre-procesing the text at all. Instead the text should be parsed in a single step into an intermediate structure that is neither wikitext nor HTML. Templates would be required to return whole structures when expanded (open what you close, close what you open) and would only be present in sanitary places (not in the middle of wiki or HTML syntax for instance).
This is possibly a good long-term goal, but I don't see how it conflicts with a postprocessing step at all. As long as parsing large pages requires significant CPU time, we'll want to cache the parsed output as much as possible, and a postprocessor will always help to reduce cache fragmentation. If we ever do move to a storage format that's so fast to process that we don't care about cache misses, of course, we could scrap the preprocessor and incorporate its effects into the main pass, no harm done.