On 02/05/12 03:24, Victor Vasiliev wrote:
I am a bit leery though about the part where you suggest that name-value arguments ({{#invoke:module|func|param=value}}) should be parsed by engine, not the script. Don't you have to expand those arguments in order to parse them, hence making any form of lazy-expanding impossible?
No, you don't have to expand the arguments in order to extract equals signs for name/value pairs. The equals signs are already identified by the preprocessor's parser, for the purposes of lazy expansion of template arguments. See PPFrame::newChild() and the implementation of the #switch parser function.
[...]
This is the part which I strongly oppose. Providing direct preprocessor access to Lua scripts is a bad idea. There are two key reasons for this:
- Preprocessor is slow.
We can limit the input size, or temporarily reduce the general parser limits like post-expand include size and node count. We can also hook into PPFrame::expand() to periodically check for a Lua timeout, if that is necessary.
The preprocessor is slow now, it won't become slower by allowing Lua to call it.
- You would have to work out many very subtle issues with time out
and nested Lua scripts. This includes timeout subtleties caused by the preprocessor slowness (load a slow template, and given the small Lua time limit, it will cause PHP to show a fatal error due to emergency timeout; even if you fix it, the standalone version uses ulimit, and it may be more difficult to fix).
The scenario you give in brackets will not happen. If a Lua timeout occurs when the parser is executing, the Lua script will terminate when the parser returns control to it. The timeout is not missed.
It doesn't matter if there are several levels of parser/Lua recursion when a timeout occurs. LuaSandbox is able to unwind the stack efficiently.
The emergency timeout mechanism is functionally equivalent to PHP's request timeout, so the emergency timeout can probably just be infinite, and we can rely on the request timeout to terminate long-running parse requests, as we do now. We could have a Lua script time limit of a few seconds, and a request timeout of 3 minutes.
Now, let me go through your suggested use cases and propose some alternatives:
- As an alternative to a string literal, to include snippets of
wikitext which are intended to be editable by people who don't know Lua. I think it would be in fact better if you provided an interface for getting unprocessed wikitext. Or a preprocessor DOM. Preprocessed text makes it is difficult to combine human-readable and machine-readable versions.
Maybe you are thinking of some sort of virtual wikidata system involving extracting little snippets of text from infobox invocations or something. I am not. I would rather use the real wikidata for that.
I am talking about including large, wikitext-formatted chunks of content language.
- During migration, to call complex metatemplates which have not yet
been ported to Lua, or to test migrated components independently instead of migrating all at once. That would eventually lead them to becoming permanent. Bugzilla quips, an authoritative reference on Wikimedia practices, says that "temporary solutions have a terrible habit of becoming permanent, around here". Hence I would suggest that we avoid the temptation in first place.
I don't think it's morally wrong to provide a migration tool. Migration will be a huge task, and will continue for years. People who migrate metatemplates to Lua will need lots of tools.
- To provide access to miscellaneous parser functions and variables.
Now, this is a really bad idea. It is like making a scary hack an official way to do things. It actually defies the first design principle you state. preprocess( "{{FULLPAGENAME}}" ) is not only much more uglier than using appropriate API like mw.page.name(), it is also a one of the slowest ways to do this. I have benchmarked it, and it is actually ~450 times slower than accessing the title object directly. Lua was (and is) meant to improve the readability of templates, not to clutter them with stuff like articlesNum = tonumber( preprocess( "{{NUMBEROFARTICLES:R}}" ) ). Solution: proper API would do the job (actually I am currently working on it).
We can provide an API for such things at some point in the future. I am not very keen on just merging whatever interface you are privately working on, without any public review.
I am publishing my proposed interface before I write the code for it, so that I can respond to the comments on it without appearing to be too invested in any given solution. I wish that you would occasionally do the same. Rewriting code that you've spent many hours on can be emotionally difficult. Perhaps that's why you've made no more changes to ustring.c despite the problems with its interface.
- To allow Lua to construct tag invocations, such as <ref> and <gallery>.
We could make a #tag-like function to do this, just as we do with parser functions.
I feel myself much more comfortable with the original return {expand = true} idea, which causes the wikitext to be expanded in the new Scribunto call frame.
That would lead to double-expansion in cases where text derived from input arguments need to be concatenated with wikitext to be expanded. Consider:
return { expand = true, text = formatHeader( frame.args.gallery_header ) .. '\n' .. '<gallery>' .. images .. '</gallery>' }
I am a bit puzzled about the "always use named arguments scheme" part, because it is not how the standard Lua library works.
It gives flexibility for future development. That was not a core principle driving the design of the standard Lua library.
-- Tim Starling