[Wikitext-l] Re: Parsoid template transclusion behavior

18 Feb 2024


      Our primary goal with Parsoid today is to ensure maximum compatibility 
with the current default parser -- without that, it would be impossible 
to switch over to Parsoid for all page rendering use cases.
But, at the core, Parsoid's design has always pursued a processing model 
where content (fragment) generators (whether templates, extensions, 
parser functions, or in the future wiki functions or other page 
components) are decoupled from the page where they are embedded. This 
lets us process them independently and incorporate those generated 
fragments efficiently. Parsoid uses this model for extensions already. 
But, that model hasn't held up for templates as they are implemented 
today because of how they are used and what they generate (snippets of 
text that can be full or partial attributes, mix of attributes and 
content, parts of tables) -- table use cases being the most egregious of 
those.
So given these practical realities, the simplest course of action for us 
to handle templates today is to have them be fully expanded as textual 
strings and do additional processing within Parsoid. But, Parsoid still 
is able to clearly demarcate page content that comes from templates (and 
other content generators) even where the template content combines with 
page level content in some complex ways (some caused by table content 
markup errors causing content fostering -- a source of unnecessary 
complexity and headaches for us).
Our goal is to start moving towards the original decoupled processing 
model for templates as well, but only after we are able to switch over 
to Parsoid more fully and that is looking closer than ever at this 
point. But, that is going to be a gradual evolution -- there are various 
proposals we have considered in the past here, but typing is probably 
the overarching concept that ties all those ideas together.
Hope that answers your primary question. Some additional tangential 
details below while I am at it.
<tangent>All that said, I wouldn't invest too much time analyzing the 
contents of that page and the notions of single-pass or multi-pass or 
PEG vs not-PEG, etc. Those are somewhat immaterial implementation 
details. I am not sure I would describe Parsoid as a single-pass model 
today. It is single-pass in only so far as it processes the textual 
string in one pass. But, otherwise, the generated tokens are processed 
multiple times as they are transformed. The DOM that is built up is 
processed multiple times ... so, if anything, Parsoid has a lot more 
(20+) passes. Separately, given that we cannot really process the 
wikitext stream to a fully processed semantic tree (because of the 
nature of wikitext), we could have used other ways of generating tokens 
along with corresponding token transformers to get the same end result. 
Since it is mostly water under the bridge now, we haven't really 
investigated the route of how this might have looked if we had used 
traditional LALR techniques (as long as we realize the output of that 
grammar would just be a different set of tokens, not a conventional 
AST). I am mostly mentioning this tangent to emphasize that our goal 
here is not to arrive at a formal (implementation) grammar in the 
traditional programming language sense, but rather to transition to a 
different (decoupled / typed) processing model while preserving 
compatibility in the interim and while giving us feasible migration 
paths to that model.</tangent>
Subbu.
On 2/16/24 23:10, psnbaotg wrote:
...
Hello,
Recently I'm researching Parsoid's design as MW is migrating to 
Parsoid. I found out that due to its single-pass tokenizing design, 
templates are not handled textually as the legacy parser does.
This is good as the HTML now have information about which template 
they are transcluded from. However, 
https://www.mediawiki.org/wiki/Parsoid/limitations%C2%A0says "We have since 
decided to use the PHP preprocessor for template expansions, which 
side-steps these issues by reverting to the traditional textual 
preprocessor pass". Is this still true now?
Best regards,
Diskdance

Wikitext-l mailing list --wikitext-l@lists.wikimedia.org
To unsubscribe send an email towikitext-l-leave@lists.wikimedia.org

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

[Wikitext-l] Re: Parsoid template transclusion behavior