On 11-01-30 02:46 PM, Jeroen De Dauw wrote:
Hey,
From the looks of the code, Validator, and various Validator based
extensions appear to be using parser->parse() inside of hooks where they are supposed to be using ->recursiveTagParse with a proper frame.
I was not aware of the correct approach here apparently. I had a look at the docs and hope I got it right now.
The Validator extension's api appears to provoke this bad practice because
I don't see the frame in the arguments render methods are using.
Now all info you need to handle parser functions and tag extensions separately is available in the render method. I also added a small utility function meant to abstract the difference, for people that like me do not want to have to care about if the call is coming from a tag or function.
Changes:
https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki...
https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Special:Code/MediaWiki...
Does this address the issue correctly?
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Partially, but no.
$parser->parse should never be used within something called by $parser->parse (ie: Any parser function, tag hook, tag function hook, parser hook, etc...) (it's still being used in one branch).
What you use varies depending on what type of output is default for the type of tag/function you have, and what you intend to output.
From memory: A parser function by default is supposed to return WikiText with things like variables and other parserfunctions and whatnot expanded. You do this with $frame->expand( $text );. You do have the option of returning raw HTML, iirc you do this using `return array( $html, 'isHTML' => true );`. If you're returning html $parser->recursiveTagParse( $text, $frame ); gets you the html. (there were some other possible output options, I can't remember the exact results... I rarely output raw html from parser functions, it's not really their purpose)
iirc tag hooks default to outputting html, so naturally you use $parser->recursiveTagParse( $text, $frame ); to get full html back.
Muddying the water a little more, we don't have just parser functions and tags... We have setFunctionHook (parser functions), setHook (old style tag hooks), and setFunctionTagHook (new style function tag hooks). We also have setTransparentTagHook, which I haven't used much myself, though I believe it's more for creating things like <div>.
setHook and setFunctionTagHook both set <tag> style hooks. Originally we just had setHook, it had one short argument list. Later on that argument list was changed to add $frame to it. the newer setFunctionTagHook uses a different argument order, which appears to be the same set of arguments, just the first two and last two seam to be flipped. The primary difference is that setHook <tag> hooks are parsed in an old style, iirc pre-1.12 style. setFunctionTagHook are parsed by the preprocessor that parses parser functions, the contents aren't preprocessed but it's still parsed by the preprocessor. I'm not exactly sure of the difference, but setFunctionTagHook definitely exists because of preprocessor improvements and because applying the new parsing behaviors to setHook would result in different behavior for tags or code bugs... perhaps attr="{{{1}}}"'s are expanded in setFunctionTagHook's because they are parsed by the preprocessor. I'm not quite sure either what setFunctionTagHook's output is, or what the setFunctionHook style $flags are for, or support for the parser function style return of an array.
Tim probably needs to point something out to me.
The flaw I see here is Validator's render is coded in a way that you're always expecting to return html. Parser functions from the very start were intended to output WikiText, not html. Straight to html kills possibilities like [[{{#fn:...}}]] and subst:, you end up creating isolated sections where html is rendered out of scope with the rest of the document, which has the potential to not play well with markup outside of the block (partial tables?), and now you're making multiple recursive calls to the portions of code handling links, tables, other bits of html and whatnot, instead of doing that all at once. Now you have extensions that are writing WikiText and having them parsed to html, when in the case of a parser function you could be just expanding that wikitext and outputting it directly instead of going to html and ending up with more efficient and flexible extensions. And you can't simply say (if they called $this->parseWikitext( ... ); and this is a parser function I'll just output wikitext instead of html. Since it's possible that they actually wrote something like `$html .= '<object>...</object><nav>' . $this->parseWikitext( ... ) . '</nav>';` and now you have WikiText and html intermixed. I expect the extension isn't making good use of the new SFH_OBJ_ARGS functionality and how the new parser (well, new if we're back in the 1.12 days) is programmed so that extensions can (and ideally should be) coded so that expensive parser functions and other parts of wikitext are not expanded and run when they're not actually used.
Validator's parser hooks really need a method of differentiating between "I'm building raw html, no way around that, I might have things like <script><object>etc... which don't exist in WikiText because that would be insecure. But I have this chunk of WikiText I'd like to have expanded to mix in with the html." and "I'm building this with WikiText, please do whatever is necessary. Whether that is expanding wikitext and outputting it normally from a parser function, or running recursiveTagParse where raw html is expected." Though what Validator really needs is for one of the few of us that managed to get a good understanding of the parser and still stay (somewhat) insane... whoops... I mean sane... *cough* to be convinced to help out with the extension, probably rewriting half of the parser hook related code and it's api. As it stands it looks like Validator is selling the many flexible options of the parser quite short. And for minimal abstraction. dpl style (other extensions use it to) inline arguments lists (ie: <foo>a=b\nc=d</foo> style) seams like a good candidate that Validator misses out and makes a little tricker to implement in the best way they can (that expansion stuff again).
I haven't even gone into all the variations that can be put on stuff you hook into the parser (and that's without using wgHooks and doing any messy parsing, I'm just talking about MW's own preprocessing and parsing of syntax); Parser functions. Substing. Three different levels of <tag> hooking. The SFH_OBJ_ARGS improvements, ie: $frame->expand on args. The built in handling of named and numeric arguments to parser functions use SFH_OBJ_ARGS (aye, with SFH_OBJ_ARGS {{#foo:bar|a=b|c=d}} the foo parser function gets $args in a format where the = is already processed and it can get key => "a" value => "b" style information [going back to expansion again], and to top it off it's done in a way such that {{#foo:bar|{{{1}}}=asdf}} will not break if {{{1}}} just happens to contain a = (which it potentially would if you used old style parsing or expanded and looked for the = yourself)). Frame expansion also can selectively expand in a way that omits templates, or variables, or recovers comments, recovers source, etc... You can also implode arguments going back to wikitext you can output (like how the parser outputs syntax when running into some transclusion errors or supports nowiki: transclusions and some cases of subst:). You already know parser functions can output wikitext and optionally html, you can also set found=false, which I believe can have an effect similar to transclusion errors (ie:outputting the original source). There are some others too. Depending on what you're doing there's also link armoring and two strip states, etc...