Our primary goal with Parsoid today is to ensure maximum compatibility with the current default parser -- without that, it would be impossible to switch over to Parsoid for all page rendering use cases.
But, at the core, Parsoid's design has always pursued a
processing model where content (fragment) generators (whether
templates, extensions, parser functions, or in the future wiki
functions or other page components) are decoupled from the page
where they are embedded. This lets us process them independently
and incorporate those generated fragments efficiently. Parsoid
uses this model for extensions already. But, that model hasn't
held up for templates as they are implemented today because of how
they are used and what they generate (snippets of text that can be
full or partial attributes, mix of attributes and content, parts
of tables) -- table use cases being the most egregious of those.
So given these practical realities, the simplest course of action
for us to handle templates today is to have them be fully expanded
as textual strings and do additional processing within Parsoid.
But, Parsoid still is able to clearly demarcate page content that
comes from templates (and other content generators) even where the
template content combines with page level content in some complex
ways (some caused by table content markup errors causing content
fostering -- a source of unnecessary complexity and headaches for
us).
Our goal is to start moving towards the original decoupled processing model for templates as well, but only after we are able to switch over to Parsoid more fully and that is looking closer than ever at this point. But, that is going to be a gradual evolution -- there are various proposals we have considered in the past here, but typing is probably the overarching concept that ties all those ideas together.
Hope that answers your primary question. Some additional
tangential details below while I am at it.
<tangent>All that said, I wouldn't invest too much time
analyzing the contents of that page and the notions of single-pass
or multi-pass or PEG vs not-PEG, etc. Those are somewhat
immaterial implementation details. I am not sure I would describe
Parsoid as a single-pass model today. It is single-pass in only so
far as it processes the textual string in one pass. But,
otherwise, the generated tokens are processed multiple times as
they are transformed. The DOM that is built up is processed
multiple times ... so, if anything, Parsoid has a lot more (20+)
passes. Separately, given that we cannot really process the
wikitext stream to a fully processed semantic tree (because of the
nature of wikitext), we could have used other ways of generating
tokens along with corresponding token transformers to get the same
end result. Since it is mostly water under the bridge now, we
haven't really investigated the route of how this might have
looked if we had used traditional LALR techniques (as long as we
realize the output of that grammar would just be a different set
of tokens, not a conventional AST). I am mostly mentioning this
tangent to emphasize that our goal here is not to arrive at a
formal (implementation) grammar in the traditional programming
language sense, but rather to transition to a different (decoupled
/ typed) processing model while preserving compatibility in the
interim and while giving us feasible migration paths to that
model.</tangent>
Subbu.
Hello,
Recently I'm researching Parsoid's design as MW is migrating to Parsoid. I found out that due to its single-pass tokenizing design, templates are not handled textually as the legacy parser does.
This is good as the HTML now have information about which template they are transcluded from. However, https://www.mediawiki.org/wiki/Parsoid/limitations says "We have since decided to use the PHP preprocessor for template expansions, which side-steps these issues by reverting to the traditional textual preprocessor pass". Is this still true now?
Best regards,Diskdance
_______________________________________________ Wikitext-l mailing list -- wikitext-l@lists.wikimedia.org To unsubscribe send an email to wikitext-l-leave@lists.wikimedia.org