On 02/05/12 03:24, Victor Vasiliev wrote:
I am a bit leery though about the part where you
suggest that
name-value arguments ({{#invoke:module|func|param=value}}) should be
parsed by engine, not the script. Don't you have to expand those
arguments in order to parse them, hence making any form of
lazy-expanding impossible?
No, you don't have to expand the arguments in order to extract equals
signs for name/value pairs. The equals signs are already identified by
the preprocessor's parser, for the purposes of lazy expansion of
template arguments. See PPFrame::newChild() and the implementation of
the #switch parser function.
[...]
This is the part which I strongly oppose. Providing
direct
preprocessor access to Lua scripts is a bad idea. There are two key
reasons for this:
1. Preprocessor is slow.
We can limit the input size, or temporarily reduce the general parser
limits like post-expand include size and node count. We can also hook
into PPFrame::expand() to periodically check for a Lua timeout, if
that is necessary.
The preprocessor is slow now, it won't become slower by allowing Lua
to call it.
2. You would have to work out many very subtle issues
with time out
and nested Lua scripts. This includes timeout subtleties caused by the
preprocessor slowness (load a slow template, and given the small Lua
time limit, it will cause PHP to show a fatal error due to emergency
timeout; even if you fix it, the standalone version uses ulimit, and
it may be more difficult to fix).
The scenario you give in brackets will not happen. If a Lua timeout
occurs when the parser is executing, the Lua script will terminate
when the parser returns control to it. The timeout is not missed.
It doesn't matter if there are several levels of parser/Lua recursion
when a timeout occurs. LuaSandbox is able to unwind the stack efficiently.
The emergency timeout mechanism is functionally equivalent to PHP's
request timeout, so the emergency timeout can probably just be
infinite, and we can rely on the request timeout to terminate
long-running parse requests, as we do now. We could have a Lua script
time limit of a few seconds, and a request timeout of 3 minutes.
Now, let me go through your suggested use cases and
propose some alternatives:
1. As an alternative to a string literal, to include snippets of
wikitext which are intended to be editable by people who don't know
Lua.
I think it would be in fact better if you provided an interface for
getting unprocessed wikitext. Or a preprocessor DOM. Preprocessed text
makes it is difficult to combine human-readable and machine-readable
versions.
Maybe you are thinking of some sort of virtual wikidata system
involving extracting little snippets of text from infobox invocations
or something. I am not. I would rather use the real wikidata for that.
I am talking about including large, wikitext-formatted chunks of
content language.
2. During migration, to call complex metatemplates
which have not yet
been ported to Lua, or to test migrated components independently
instead of migrating all at once.
That would eventually lead them to becoming permanent. Bugzilla quips,
an authoritative reference on Wikimedia practices, says that
"temporary solutions have a terrible habit of becoming permanent,
around here". Hence I would suggest that we avoid the temptation in
first place.
I don't think it's morally wrong to provide a migration tool.
Migration will be a huge task, and will continue for years. People who
migrate metatemplates to Lua will need lots of tools.
3. To provide access to miscellaneous parser functions
and variables.
Now, this is a really bad idea. It is like making a scary hack an
official way to do things. It actually defies the first design
principle you state. preprocess( "{{FULLPAGENAME}}" ) is not only much
more uglier than using appropriate API like mw.page.name(), it is also
a one of the slowest ways to do this. I have benchmarked it, and it is
actually ~450 times slower than accessing the title object directly.
Lua was (and is) meant to improve the readability of templates, not to
clutter them with stuff like articlesNum = tonumber( preprocess(
"{{NUMBEROFARTICLES:R}}" ) ).
Solution: proper API would do the job (actually I am currently working on it).
We can provide an API for such things at some point in the future. I
am not very keen on just merging whatever interface you are privately
working on, without any public review.
I am publishing my proposed interface before I write the code for it,
so that I can respond to the comments on it without appearing to be
too invested in any given solution. I wish that you would occasionally
do the same. Rewriting code that you've spent many hours on can be
emotionally difficult. Perhaps that's why you've made no more changes
to ustring.c despite the problems with its interface.
4. To allow Lua to construct tag invocations, such as
<ref> and <gallery>.
We could make a #tag-like function to do this, just as we do with
parser functions.
I feel myself much more comfortable with the original return {expand =
true} idea, which causes the wikitext to be expanded in the new
Scribunto call frame.
That would lead to double-expansion in cases where text derived from
input arguments need to be concatenated with wikitext to be expanded.
Consider:
return {
expand = true,
text = formatHeader( frame.args.gallery_header ) .. '\n' ..
'<gallery>' .. images .. '</gallery>' }
I am a bit puzzled about the "always use named
arguments scheme" part,
because it is not how the standard Lua library works.
It gives flexibility for future development. That was not a core
principle driving the design of the standard Lua library.
-- Tim Starling