The RFC proposal for "hygienic templates" got improved a bunch -- and renamed to "balanced templates" -- during the parsing team off-site. It seems like it's worth resending this to the list. Comments on the updated draft welcome!
----
As described in my Wikimania 2015 talk https://wikimania2015.wikimedia.org/wiki/Submissions/Templates_are_dead!_Long_live_templates! (starting at slide 27 https://wikimania2015.wikimedia.org/w/index.php?title=File:Templates_are_dead!_Long_live_templates!.pdf&page=27), there are a number of reasons to mark certain templates as "balanced". Foremost among them: to allow high-performance incremental update of page contents after templates are modified, and to allow safe editing of template uses using HTML-based tools such as Visual Editor or jsapi https://doc.wikimedia.org/Parsoid/master/#!/guide/jsapi.
This means (roughly) that the output of the template is a complete DocumentFragment https://developer.mozilla.org/en-US/docs/Web/API/DocumentFragment: every open tag is closed and there are no nodes which the HTML adoption agency algorithm http://dev.w3.org/html5/spec-LC/tree-construction.html#adoptionAgency will reorder. (More precise details below.)
Template balance is enforced: tags are closed or removed as necessary to ensure that the output satisfies the necessary constraints, regardless of the values of the template arguments or how child templates are expanded. You can imagine this as running tidy (or something like it https://phabricator.wikimedia.org/T89331) on the template output before it is inserted into the document; but see below for the actual implementation.
The primary benefit of balanced templates is allowing efficient update of articles by doing substring substitution for template bodies, without having to expand all templates to wikitext and reparse from scratch. It also guarantees that the template (and surrounding content) will be editable in Visual Editor; mistakes in template arguments won't "leak out" and prevent editing of surrounding content.
***Wikitext Syntax***
After some bikeshedding, we decided that balance should be an "opt-in" property of templates, indicated by adding a `{{#balance:TYPE}}` marker to the content. This syntax leverages the existing "parser function" syntax, and allows for different types of balance to be named where `TYPE` is.
We propose three forms of balance, of which only the first is likely to be implemented initially. Other balancing modes would provide safety in different HTML-parsing contexts. We've named two below; more might be added in the future if there is need.
1. `{{#balance:block}}` would close any open `<p>`/`<a>`/`<h*>`/`<table>` tags in the article preceding the template insertion site. In the template content all tags left open at the end will be closed, but there is no other restriction. This is similar to how block-level tags work in HTML 5. This is useful for navboxes and other "block" content. 2. `{{#balance:inline}}` would only allow inline (i.e. phrasing) content and generate an error if a `<p>`/`<a>`/`<h*>`/`<table>`/`<tr>`/`<td>`/` <th>`/`<li>` tag is seen in the content. But because of this, it //*can*// be used inside a block-level context without closing active `<p>`/`<a>`/` <h*>`/`<table>` in the article (as `{{#balance:block}}` would). This is useful for simple plain text templates, e.g. age calculation. 3. `{{#balance:table}}` would close `<p>`/`<a>`/`<h*>` but would allow insertion inside `<table>` and allow `<td>`/`<th>` tags in the content. (There might be some other content restrictions to prevent fostering.)
We expect `{{#balance:block}}` to be most useful for the large-ish templates whose efficient replacement would make the most impact on performance, and so we propose `{{#balance:}}` as a possible shorthand for ` {{#balance:block}}`. (The current wikitext grammar does not allow ` {{#balance}}`, since the trailing colon is required in parser function names, but if desired we could probably accommodate that abbreviation as well without too much pain.)
Violations of content restrictions (ie, a `<p>` tag in a ` {{#balance:inline}}` template) would be errors, but how these errors would be conveyed is an orthogonal issue. Some options for error reporting include ugly bold text visible to readers (like `{{cite}}`), wikilint-like reports, or inclusion in `[[Category:Balance Errors]]`. Note that errors might not appear immediately: they may only occur when some other included template is edited to newly produce disallowed content, or only when certain values are passed as template arguments.
***Implementation***
Implementation is slightly different in the PHP parser and in Parsoid. Incremental parsing/update would necessarily not be done in the PHP parser, but it does need to enforce equivalent content model constraints for consistency.
PHP parser implementation strategy:
- When a template with `{{#balance}}` is expanded, add a marker to the start of its output. - In the Sanitizer leave that marker alone, and then just before handling the output to tidy/depurate https://phabricator.wikimedia.org/T89331 we'll replace the marker with `</p></table>...etc...`. That pass will close the tags (and discard any irrelevant `</...>` tags). Some care needed to ensure we discard unnecessary close tags, and not html-entity-escape them. - PHP might not be able to implement `{{#balance:inline}}` or ` {{#balance:table}}` quite yet -- there might need to be a special depurate mode, or do it in a DOM-based sanitizer, something like that. We can concentrate on `{{#balance:block}}` initially.
In Parsoid:
- We just need to emit synthetic `</p></table></...>` tokens, the tree builder will take care of closing a tag if necessary or else discarding the token. - When PHP switches over to a DOM-based sanitizer, it might be able to use this same strategy.
***Deployment***
Unmarked templates are "unbalanced" and will render exactly the same as before, they will just be slower (require more CPU time) than balanced templates.
It is expected that we will profile the "costliest"/"most frequently used/changed" templates on wikimedia projects and attempt to add balance markers first to those templates where the greatest potential performance gain may be achieved. Tim Starling noticed that adding a balance marker to `[[:en:Template:Infobox]] https://en.wikipedia.org/wiki/Template:Infobox` could affect over two million pages and have a large immediate effect on performance. We would want to carefully verify first that balance would not affect the appearance of any of those pages, using visual diff or other tools.
Related: {T89331 https://phabricator.wikimedia.org/T89331}, {T114072 https://phabricator.wikimedia.org/T114072}.