The RFC proposal for "hygienic templates" got improved a bunch -- and
renamed to "balanced templates" -- during the parsing team off-site. It
seems like it's worth resending this to the list. Comments on the updated
draft welcome!
----
As described in my Wikimania 2015 talk
<https://wikimania2015.wikimedia.org/wiki/Submissions/Templates_are_dead!_Long_live_templates!>
(starting at slide 27
<https://wikimania2015.wikimedia.org/w/index.php?title=File:Templates_are_dead!_Long_live_templates!.pdf&page=27>),
there are a number of reasons to mark certain templates as "balanced".
Foremost among them: to allow high-performance incremental update of page
contents after templates are modified, and to allow safe editing of
template uses using HTML-based tools such as Visual Editor or jsapi
<https://doc.wikimedia.org/Parsoid/master/#!/guide/jsapi>.
This means (roughly) that the output of the template is a complete
DocumentFragment
<https://developer.mozilla.org/en-US/docs/Web/API/DocumentFragment>: every
open tag is closed and there are no nodes which the HTML adoption agency
algorithm
<http://dev.w3.org/html5/spec-LC/tree-construction.html#adoptionAgency> will
reorder. (More precise details below.)
Template balance is enforced: tags are closed or removed as necessary to
ensure that the output satisfies the necessary constraints, regardless of
the values of the template arguments or how child templates are expanded.
You can imagine this as running tidy (or something like it
<https://phabricator.wikimedia.org/T89331>) on the template output before
it is inserted into the document; but see below for the actual
implementation.
The primary benefit of balanced templates is allowing efficient update of
articles by doing substring substitution for template bodies, without
having to expand all templates to wikitext and reparse from scratch. It
also guarantees that the template (and surrounding content) will be
editable in Visual Editor; mistakes in template arguments won't "leak out"
and prevent editing of surrounding content.
***Wikitext Syntax***
After some bikeshedding, we decided that balance should be an "opt-in"
property of templates, indicated by adding a `{{#balance:TYPE}}` marker to
the content. This syntax leverages the existing "parser function" syntax,
and allows for different types of balance to be named where `TYPE` is.
We propose three forms of balance, of which only the first is likely to be
implemented initially. Other balancing modes would provide safety in
different HTML-parsing contexts. We've named two below; more might be
added in the future if there is need.
1. `{{#balance:block}}` would close any open
`<p>`/`<a>`/`<h*>`/`<table>`
tags in the article preceding the template insertion site. In the template
content all tags left open at the end will be closed, but there is no other
restriction. This is similar to how block-level tags work in HTML 5. This
is useful for navboxes and other "block" content.
2. `{{#balance:inline}}` would only allow inline (i.e. phrasing) content
and generate an error if a
`<p>`/`<a>`/`<h*>`/`<table>`/`<tr>`/`<td>`/`
<th>`/`<li>` tag is seen in the content. But because of this, it
//*can*//
be used inside a block-level context without closing active `<p>`/`<a>`/`
<h*>`/`<table>` in the article (as `{{#balance:block}}` would). This is
useful for simple plain text templates, e.g. age calculation.
3. `{{#balance:table}}` would close `<p>`/`<a>`/`<h*>` but would
allow
insertion inside `<table>` and allow `<td>`/`<th>` tags in the
content.
(There might be some other content restrictions to prevent fostering.)
We expect `{{#balance:block}}` to be most useful for the large-ish
templates whose efficient replacement would make the most impact on
performance, and so we propose `{{#balance:}}` as a possible shorthand for `
{{#balance:block}}`. (The current wikitext grammar does not allow `
{{#balance}}`, since the trailing colon is required in parser function
names, but if desired we could probably accommodate that abbreviation as
well without too much pain.)
Violations of content restrictions (ie, a `<p>` tag in a `
{{#balance:inline}}` template) would be errors, but how these errors would
be conveyed is an orthogonal issue. Some options for error reporting
include ugly bold text visible to readers (like `{{cite}}`), wikilint-like
reports, or inclusion in `[[Category:Balance Errors]]`. Note that errors
might not appear immediately: they may only occur when some other included
template is edited to newly produce disallowed content, or only when
certain values are passed as template arguments.
***Implementation***
Implementation is slightly different in the PHP parser and in Parsoid.
Incremental parsing/update would necessarily not be done in the PHP parser,
but it does need to enforce equivalent content model constraints for
consistency.
PHP parser implementation strategy:
- When a template with `{{#balance}}` is expanded, add a marker to the
start of its output.
- In the Sanitizer leave that marker alone, and then just before
handling the output to tidy/depurate
<https://phabricator.wikimedia.org/T89331> we'll replace the marker with
`</p></table>...etc...`. That pass will close the tags (and discard any
irrelevant `</...>` tags). Some care needed to ensure we discard
unnecessary close tags, and not html-entity-escape them.
- PHP might not be able to implement `{{#balance:inline}}` or `
{{#balance:table}}` quite yet -- there might need to be a special
depurate mode, or do it in a DOM-based sanitizer, something like that. We
can concentrate on `{{#balance:block}}` initially.
In Parsoid:
- We just need to emit synthetic `</p></table></...>` tokens, the
tree
builder will take care of closing a tag if necessary or else discarding the
token.
- When PHP switches over to a DOM-based sanitizer, it might be able to
use this same strategy.
***Deployment***
Unmarked templates are "unbalanced" and will render exactly the same as
before, they will just be slower (require more CPU time) than balanced
templates.
It is expected that we will profile the "costliest"/"most frequently
used/changed" templates on wikimedia projects and attempt to add balance
markers first to those templates where the greatest potential performance
gain may be achieved. Tim Starling noticed that adding a balance marker to
`[[:en:Template:Infobox]] <https://en.wikipedia.org/wiki/Template:Infobox>`
could affect over two million pages and have a large immediate effect on
performance. We would want to carefully verify first that balance would
not affect the appearance of any of those pages, using visual diff or other
tools.
Related: {T89331 <https://phabricator.wikimedia.org/T89331>}, {T114072
<https://phabricator.wikimedia.org/T114072>}.
--
(
http://cscott.net)