Re: [Wikitech-l] Inclusion request for the Validator extension

31 Jan 2011

      On 11-01-30 02:46 PM, Jeroen De Dauw wrote:
...
Hey,
...
From the looks of the code, Validator, and various Validator based
extensions appear to be using parser->parse() inside of hooks where they are
supposed to be using ->recursiveTagParse with a proper frame.
I was not aware of the correct approach here apparently. I had a look at the
docs and hope I got it right now.
...
The Validator extension's api appears to provoke this bad practice because
I don't see the frame in the arguments render methods are using.
Now all info you need to handle parser functions and tag extensions
separately is available in the render method. I also added a small utility
function meant to abstract the difference, for people that like me do not
want to have to care about if the call is coming from a tag or function.
Changes:

https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki...

https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki...
Does this address the issue correctly?
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
Partially, but no.
$parser->parse should never be used within something called by 
$parser->parse (ie: Any parser function, tag hook, tag function hook, 
parser hook, etc...) (it's still being used in one branch).
What you use varies depending on what type of output is default for the 
type of tag/function you have, and what you intend to output.
From memory:
A parser function by default is supposed to return WikiText with things 
like variables and other parserfunctions and whatnot expanded. You do 
this with $frame->expand( $text );.
You do have the option of returning raw HTML, iirc you do this using 
`return array( $html, 'isHTML' => true );`. If you're returning html 
$parser->recursiveTagParse( $text, $frame ); gets you the html. (there 
were some other possible output options, I can't remember the exact 
results... I rarely output raw html from parser functions, it's not 
really their purpose)
iirc tag hooks default to outputting html, so naturally you use 
$parser->recursiveTagParse( $text, $frame ); to get full html back.
Muddying the water a little more, we don't have just parser functions 
and tags... We have setFunctionHook (parser functions), setHook (old 
style tag hooks), and setFunctionTagHook (new style function tag hooks). 
We also have setTransparentTagHook, which I haven't used much myself, 
though I believe it's more for creating things like <div>.
setHook and setFunctionTagHook both set <tag> style hooks. Originally we 
just had setHook, it had one short argument list. Later on that argument 
list was changed to add $frame to it. the newer setFunctionTagHook uses 
a different argument order, which appears to be the same set of 
arguments, just the first two and last two seam to be flipped. The 
primary difference is that setHook <tag> hooks are parsed in an old 
style, iirc pre-1.12 style. setFunctionTagHook are parsed by the 
preprocessor that parses parser functions, the contents aren't 
preprocessed but it's still parsed by the preprocessor. I'm not exactly 
sure of the difference, but setFunctionTagHook definitely exists because 
of preprocessor improvements and because applying the new parsing 
behaviors to setHook would result in different behavior for tags or code 
bugs... perhaps attr="{{{1}}}"'s are expanded in setFunctionTagHook's 
because they are parsed by the preprocessor. I'm not quite sure either 
what setFunctionTagHook's output is, or what the setFunctionHook style 
$flags are for, or support for the parser function style return of an array.
Tim probably needs to point something out to me.
The flaw I see here is Validator's render is coded in a way that you're 
always expecting to return html. Parser functions from the very start 
were intended to output WikiText, not html. Straight to html kills 
possibilities like [[{{#fn:...}}]] and subst:, you end up creating 
isolated sections where html is rendered out of scope with the rest of 
the document, which has the potential to not play well with markup 
outside of the block (partial tables?), and now you're making multiple 
recursive calls to the portions of code handling links, tables, other 
bits of html and whatnot, instead of doing that all at once.
Now you have extensions that are writing WikiText and having them parsed 
to html, when in the case of a parser function you could be just 
expanding that wikitext and outputting it directly instead of going to 
html and ending up with more efficient and flexible extensions. And you 
can't simply say (if they called $this->parseWikitext( ... ); and this 
is a parser function I'll just output wikitext instead of html. Since 
it's possible that they actually wrote something like `$html .= 
'<object>...</object><nav>' . $this->parseWikitext( ... ) . '</nav>';` 
and now you have WikiText and html intermixed. I expect the extension 
isn't making good use of the new SFH_OBJ_ARGS functionality and how the 
new parser (well, new if we're back in the 1.12 days) is programmed so 
that extensions can (and ideally should be) coded so that expensive 
parser functions and other parts of wikitext are not expanded and run 
when they're not actually used.
Validator's parser hooks really need a method of differentiating between 
"I'm building raw html, no way around that, I might have things like 
<script><object>etc... which don't exist in WikiText because that would 
be insecure. But I have this chunk of WikiText I'd like to have expanded 
to mix in with the html." and "I'm building this with WikiText, please 
do whatever is necessary. Whether that is expanding wikitext and 
outputting it normally from a parser function, or running 
recursiveTagParse where raw html is expected."
Though what Validator really needs is for one of the few of us that 
managed to get a good understanding of the parser and still stay 
(somewhat) insane... whoops... I mean sane... *cough* to be convinced to 
help out with the extension, probably rewriting half of the parser hook 
related code and it's api. As it stands it looks like Validator is 
selling the many flexible options of the parser quite short. And for 
minimal abstraction. dpl style (other extensions use it to) inline 
arguments lists (ie: <foo>a=b\nc=d</foo> style) seams like a good 
candidate that Validator misses out and makes a little tricker to 
implement in the best way they can (that expansion stuff again).
I haven't even gone into all the variations that can be put on stuff you 
hook into the parser (and that's without using wgHooks and doing any 
messy parsing, I'm just talking about MW's own preprocessing and parsing 
of syntax); Parser functions. Substing. Three different levels of <tag> 
hooking. The SFH_OBJ_ARGS improvements, ie: $frame->expand on args. The 
built in handling of named and numeric arguments to parser functions use 
SFH_OBJ_ARGS (aye, with SFH_OBJ_ARGS {{#foo:bar|a=b|c=d}} the foo parser 
function gets $args in a format where the = is already processed and it 
can get key => "a" value => "b" style information [going back to 
expansion again], and to top it off it's done in a way such that 
{{#foo:bar|{{{1}}}=asdf}} will not break if {{{1}}} just happens to 
contain a = (which it potentially would if you used old style parsing or 
expanded and looked for the = yourself)). Frame expansion also can 
selectively expand in a way that omits templates, or variables, or 
recovers comments, recovers source, etc... You can also implode 
arguments going back to wikitext you can output (like how the parser 
outputs syntax when running into some transclusion errors or supports 
nowiki: transclusions and some cases of subst:). You already know parser 
functions can output wikitext and optionally html, you can also set 
found=false, which I believe can have an effect similar to transclusion 
errors (ie:outputting the original source). There are some others too. 
Depending on what you're doing there's also link armoring and two strip 
states, etc...
-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Inclusion request for the Validator extension