Lua: parser interface

List overview All Threads
Download

newer

older

MediaWiki 1.19.0 released

Re: [Wikitech-l] [Wikimedia-l]...

Tim Starling

1 May 2012 1 May '12

7:15 a.m.

I've written up a proposed interface between the MediaWiki parser and Lua:

https://www.mediawiki.org/wiki/Extension:Scribunto/Parser_interface_design

In summary: the Lua function is called with a single argument, which is an object representing the parser interface. The object is roughly equivalent to a PPFrame.

The object would have a property called "args", which is a table with its "index" metamethod overridden to provide lazy-initialised access to the parser function arguments with a brief syntax:

function p.func(frame) return frame.args.name --- returns "value" end

There would be two methods for recursive preprocessing:

* preprocess() provides basic expansion of wikitext * callTemplate() provides an API for template invocation, since I imagine that would otherwise be a common use case for preprocess(). Using preprocess() to expand a template with arbitrary arguments would be difficult.

Like a normal parser function, the Lua function returns text which is not modified any further by the preprocessor.

Please see the wiki page for a more detailed description, including rationale.

Any comments would be greatly appreciated.

-- Tim Starling

Show replies by date

Victor Vasiliev

1 May 1 May

5:24 p.m.

Thank you for bringing those issues to the public discussion, Tim, they are really worth it.

On Tue, May 1, 2012 at 11:15 AM, Tim Starling tstarling@wikimedia.org wrote:

...

I've written up a proposed interface between the MediaWiki parser and Lua:

https://www.mediawiki.org/wiki/Extension:Scribunto/Parser_interface_design

In summary: the Lua function is called with a single argument, which is an object representing the parser interface. The object is roughly equivalent to a PPFrame.

The object would have a property called "args", which is a table with its "index" metamethod overridden to provide lazy-initialised access to the parser function arguments with a brief syntax:

{{#invoke:module|func|name=value}}

function p.func(frame) return frame.args.name --- returns "value" end

I like this part. Also, I really enjoy the idea of making a separate parser frame for script instead of running it in the parent template's frame.

I am a bit leery though about the part where you suggest that name-value arguments ({{#invoke:module|func|param=value}}) should be parsed by engine, not the script. Don't you have to expand those arguments in order to parse them, hence making any form of lazy-expanding impossible?

...

There would be two methods for recursive preprocessing:

preprocess() provides basic expansion of wikitext

callTemplate() provides an API for template invocation, since I

imagine that would otherwise be a common use case for preprocess(). Using preprocess() to expand a template with arbitrary arguments would be difficult.

Like a normal parser function, the Lua function returns text which is not modified any further by the preprocessor.

This is the part which I strongly oppose. Providing direct preprocessor access to Lua scripts is a bad idea. There are two key reasons for this: 1. Preprocessor is slow. 2. You would have to work out many very subtle issues with time out and nested Lua scripts. This includes timeout subtleties caused by the preprocessor slowness (load a slow template, and given the small Lua time limit, it will cause PHP to show a fatal error due to emergency timeout; even if you fix it, the standalone version uses ulimit, and it may be more difficult to fix).

Now, let me go through your suggested use cases and propose some alternatives:

1. As an alternative to a string literal, to include snippets of wikitext which are intended to be editable by people who don't know Lua. I think it would be in fact better if you provided an interface for getting unprocessed wikitext. Or a preprocessor DOM. Preprocessed text makes it is difficult to combine human-readable and machine-readable versions.

2. During migration, to call complex metatemplates which have not yet been ported to Lua, or to test migrated components independently instead of migrating all at once. That would eventually lead them to becoming permanent. Bugzilla quips, an authoritative reference on Wikimedia practices, says that "temporary solutions have a terrible habit of becoming permanent, around here". Hence I would suggest that we avoid the temptation in first place.

3. To provide access to miscellaneous parser functions and variables. Now, this is a really bad idea. It is like making a scary hack an official way to do things. It actually defies the first design principle you state. preprocess( "{{FULLPAGENAME}}" ) is not only much more uglier than using appropriate API like mw.page.name(), it is also a one of the slowest ways to do this. I have benchmarked it, and it is actually ~450 times slower than accessing the title object directly. Lua was (and is) meant to improve the readability of templates, not to clutter them with stuff like articlesNum = tonumber( preprocess( "{{NUMBEROFARTICLES:R}}" ) ). Solution: proper API would do the job (actually I am currently working on it).

4. To allow Lua to construct tag invocations, such as <ref> and <gallery>. We could make a #tag-like function to do this, just as we do with parser functions.

I feel myself much more comfortable with the original return {expand = true} idea, which causes the wikitext to be expanded in the new Scribunto call frame.

...

Please see the wiki page for a more detailed description, including rationale.

Thank you for writing such a detail description.

I am a bit puzzled about the "always use named arguments scheme" part, because it is not how the standard Lua library works.

I guess that's all my concerns for now.

Thanks, Victor.

Tim Starling

2 May 2 May

12:21 a.m.

On 02/05/12 03:24, Victor Vasiliev wrote:

...

I am a bit leery though about the part where you suggest that name-value arguments ({{#invoke:module|func|param=value}}) should be parsed by engine, not the script. Don't you have to expand those arguments in order to parse them, hence making any form of lazy-expanding impossible?

No, you don't have to expand the arguments in order to extract equals signs for name/value pairs. The equals signs are already identified by the preprocessor's parser, for the purposes of lazy expansion of template arguments. See PPFrame::newChild() and the implementation of the #switch parser function.

[...]

...

This is the part which I strongly oppose. Providing direct preprocessor access to Lua scripts is a bad idea. There are two key reasons for this:

Preprocessor is slow.

We can limit the input size, or temporarily reduce the general parser limits like post-expand include size and node count. We can also hook into PPFrame::expand() to periodically check for a Lua timeout, if that is necessary.

The preprocessor is slow now, it won't become slower by allowing Lua to call it.

...

You would have to work out many very subtle issues with time out

and nested Lua scripts. This includes timeout subtleties caused by the preprocessor slowness (load a slow template, and given the small Lua time limit, it will cause PHP to show a fatal error due to emergency timeout; even if you fix it, the standalone version uses ulimit, and it may be more difficult to fix).

The scenario you give in brackets will not happen. If a Lua timeout occurs when the parser is executing, the Lua script will terminate when the parser returns control to it. The timeout is not missed.

It doesn't matter if there are several levels of parser/Lua recursion when a timeout occurs. LuaSandbox is able to unwind the stack efficiently.

The emergency timeout mechanism is functionally equivalent to PHP's request timeout, so the emergency timeout can probably just be infinite, and we can rely on the request timeout to terminate long-running parse requests, as we do now. We could have a Lua script time limit of a few seconds, and a request timeout of 3 minutes.

...

Now, let me go through your suggested use cases and propose some alternatives:

As an alternative to a string literal, to include snippets of

wikitext which are intended to be editable by people who don't know Lua. I think it would be in fact better if you provided an interface for getting unprocessed wikitext. Or a preprocessor DOM. Preprocessed text makes it is difficult to combine human-readable and machine-readable versions.

Maybe you are thinking of some sort of virtual wikidata system involving extracting little snippets of text from infobox invocations or something. I am not. I would rather use the real wikidata for that.

I am talking about including large, wikitext-formatted chunks of content language.

...

During migration, to call complex metatemplates which have not yet

been ported to Lua, or to test migrated components independently instead of migrating all at once. That would eventually lead them to becoming permanent. Bugzilla quips, an authoritative reference on Wikimedia practices, says that "temporary solutions have a terrible habit of becoming permanent, around here". Hence I would suggest that we avoid the temptation in first place.

I don't think it's morally wrong to provide a migration tool. Migration will be a huge task, and will continue for years. People who migrate metatemplates to Lua will need lots of tools.

...

To provide access to miscellaneous parser functions and variables.

Now, this is a really bad idea. It is like making a scary hack an official way to do things. It actually defies the first design principle you state. preprocess( "{{FULLPAGENAME}}" ) is not only much more uglier than using appropriate API like mw.page.name(), it is also a one of the slowest ways to do this. I have benchmarked it, and it is actually ~450 times slower than accessing the title object directly. Lua was (and is) meant to improve the readability of templates, not to clutter them with stuff like articlesNum = tonumber( preprocess( "{{NUMBEROFARTICLES:R}}" ) ). Solution: proper API would do the job (actually I am currently working on it).

We can provide an API for such things at some point in the future. I am not very keen on just merging whatever interface you are privately working on, without any public review.

I am publishing my proposed interface before I write the code for it, so that I can respond to the comments on it without appearing to be too invested in any given solution. I wish that you would occasionally do the same. Rewriting code that you've spent many hours on can be emotionally difficult. Perhaps that's why you've made no more changes to ustring.c despite the problems with its interface.

...

To allow Lua to construct tag invocations, such as <ref> and <gallery>.

We could make a #tag-like function to do this, just as we do with parser functions.

I feel myself much more comfortable with the original return {expand = true} idea, which causes the wikitext to be expanded in the new Scribunto call frame.

That would lead to double-expansion in cases where text derived from input arguments need to be concatenated with wikitext to be expanded. Consider:

return { expand = true, text = formatHeader( frame.args.gallery_header ) .. '\n' .. '<gallery>' .. images .. '</gallery>' }

...

I am a bit puzzled about the "always use named arguments scheme" part, because it is not how the standard Lua library works.

It gives flexibility for future development. That was not a core principle driving the design of the standard Lua library.

-- Tim Starling

Victor Vasiliev

1:28 a.m.

On Wed, May 2, 2012 at 4:21 AM, Tim Starling tstarling@wikimedia.org wrote:

...

We can limit the input size, or temporarily reduce the general parser limits like post-expand include size and node count. We can also hook into PPFrame::expand() to periodically check for a Lua timeout, if that is necessary.

The preprocessor is slow now, it won't become slower by allowing Lua to call it.

What I meant is that one of the goals of Lua project is to improve the performance of template system, and by invoking the preprocessor you slow it down because of parser overhauls.

...

...

You would have to work out many very subtle issues with time out

and nested Lua scripts. This includes timeout subtleties caused by the preprocessor slowness (load a slow template, and given the small Lua time limit, it will cause PHP to show a fatal error due to emergency timeout; even if you fix it, the standalone version uses ulimit, and it may be more difficult to fix).

The scenario you give in brackets will not happen. If a Lua timeout occurs when the parser is executing, the Lua script will terminate when the parser returns control to it. The timeout is not missed.

But the parser working time would still be included in normal Lua time limit?

...

It doesn't matter if there are several levels of parser/Lua recursion when a timeout occurs. LuaSandbox is able to unwind the stack efficiently.

What I meant is that it should be able to handle the time limit correctly and avoid things like doubling time because of the nested scripts.

[...]

...

...

As an alternative to a string literal, to include snippets of

wikitext which are intended to be editable by people who don't know Lua. I think it would be in fact better if you provided an interface for getting unprocessed wikitext. Or a preprocessor DOM. Preprocessed text makes it is difficult to combine human-readable and machine-readable versions.

Maybe you are thinking of some sort of virtual wikidata system involving extracting little snippets of text from infobox invocations or something. I am not. I would rather use the real wikidata for that.

I am talking about the usual situation around there when the same data (say, list of TFAs) is displayed in a variety of ways among the wiki.

...

I am talking about including large, wikitext-formatted chunks of content language.

Well, then you can just dump its content into an output and tell parser to expand it.

...

...

During migration, to call complex metatemplates which have not yet

been ported to Lua, or to test migrated components independently instead of migrating all at once. That would eventually lead them to becoming permanent. Bugzilla quips, an authoritative reference on Wikimedia practices, says that "temporary solutions have a terrible habit of becoming permanent, around here". Hence I would suggest that we avoid the temptation in first place.

I don't think it's morally wrong to provide a migration tool. Migration will be a huge task, and will continue for years. People who migrate metatemplates to Lua will need lots of tools.

Agreed.

(though I am still skeptical about preprocess() and believe there might be pitfalls with this we are not currently seeing)

...

...

To provide access to miscellaneous parser functions and variables.

Now, this is a really bad idea. It is like making a scary hack an official way to do things. It actually defies the first design principle you state. preprocess( "{{FULLPAGENAME}}" ) is not only much more uglier than using appropriate API like mw.page.name(), it is also a one of the slowest ways to do this. I have benchmarked it, and it is actually ~450 times slower than accessing the title object directly. Lua was (and is) meant to improve the readability of templates, not to clutter them with stuff like articlesNum = tonumber( preprocess( "{{NUMBEROFARTICLES:R}}" ) ). Solution: proper API would do the job (actually I am currently working on it).

We can provide an API for such things at some point in the future. I am not very keen on just merging whatever interface you are privately working on, without any public review.

Neither am I.

...

I am publishing my proposed interface before I write the code for it, so that I can respond to the comments on it without appearing to be too invested in any given solution. I wish that you would occasionally do the same.

By "working" I meant prototyping the API with some demo functions and writing a proposed API description for public review.

...

Rewriting code that you've spent many hours on can be emotionally difficult. Perhaps that's why you've made no more changes to ustring.c despite the problems with its interface.

ustring.c work is on hold because of the problems with pure Lua implementation design issues. I probably will include it into an API proposal and discuss it together with other API issues.

...

...

To allow Lua to construct tag invocations, such as <ref> and <gallery>.

We could make a #tag-like function to do this, just as we do with parser functions.

I feel myself much more comfortable with the original return {expand = true} idea, which causes the wikitext to be expanded in the new Scribunto call frame.

That would lead to double-expansion in cases where text derived from input arguments need to be concatenated with wikitext to be expanded. Consider:

return { expand = true, text = formatHeader( frame.args.gallery_header ) .. '\n' .. '<gallery>' .. images .. '</gallery>' }

formatHeader( "{{{gallery_header}}}" )?

...

...
I am a bit puzzled about the "always use named arguments scheme" part, because it is not how the standard Lua library works.

It gives flexibility for future development. That was not a core principle driving the design of the standard Lua library.

Agreed.

Thanks for detailed response, Victor.

Tim Starling

2:23 a.m.

On 02/05/12 11:28, Victor Vasiliev wrote:

...

...
The scenario you give in brackets will not happen. If a Lua timeout occurs when the parser is executing, the Lua script will terminate when the parser returns control to it. The timeout is not missed.

But the parser working time would still be included in normal Lua time limit?

For LuaSandbox, yes the parser time is included. For LuaStandalone the parser time is not included in the limit, but it could be measured using getrusage() if that were deemed important.

...

...
It doesn't matter if there are several levels of parser/Lua recursion when a timeout occurs. LuaSandbox is able to unwind the stack efficiently.

What I meant is that it should be able to handle the time limit correctly and avoid things like doubling time because of the nested scripts.

Yes, that is done correctly also. Each LuaSandbox object has a single timer which is started and stopped at the base recursion level and ignored at higher levels of recursion.

-- Tim Starling

Gabriel Wicke

1 May 1 May

6:51 p.m.

On 05/01/2012 09:15 AM, Tim Starling wrote:

...

In summary: the Lua function is called with a single argument, which is an object representing the parser interface. The object is roughly equivalent to a PPFrame.

+1 for the abstract frame object.

...

The object would have a property called "args", which is a table with its "index" metamethod overridden to provide lazy-initialised access to the parser function arguments with a brief syntax:

{{#invoke:module|func|name=value}}

function p.func(frame) return frame.args.name --- returns "value" end

There would be two methods for recursive preprocessing:

preprocess() provides basic expansion of wikitext

An alternative to a wikitext-specific preprocess() method and plain-text argument values could be a conversion / expansion method on an opaque 'parser value' object:

frame.args.name.expandTo( 'text/x-mediawiki' ) --- returns "value"

This would make it possible to work with other formats apart from wikitext.

I recently added an API like this in Parsoid (the method is called 'as' there), and liked the way that worked out for parser functions. I am currently using the 'text/plain' type to retrieve a text expansion with comments etc stripped, and 'tokens/x-mediawiki' for expanded tokens (~list of tags and strings). Additional formats can be supported without a proliferation of methods. Each value object has a reference to its frame, and can be passed around and eventually lazily expanded elsewhere. Expansion results can be cached inside the value object and shared between multiple use sites (the value is associated with a single frame after all).

The Parsoid .as method additionally takes a callback argument to support asynchronous expansions. This might be too complex for user-friendly Lua scripting, but could still be something worth considering in the longer term. It could be added as a separate 'expandToAsync' method.

The conversion of wikitext or other formats to an opaque value object could be achieved using an object constructor:

--- 'value text' is parsed lazily ParserValue( 'text/x-mediawiki', 'value text', frame )

The frame might be the passed-in parent frame, or a custom one constructed with args assembled from other ParserValues.

Calls to existing templates could be supported with a convenient TemplateParserValue constructor, which does not specify how a template call is represented internally.

TemplateParserValue( 'tpl', args ).expandTo( 'text/plain' )

Finally, a ParserValue (or a list of those) could be used for the return type of functions to support output formats other than plain text.

Overall, I would love to keep the access to values as opaque as possible to enable back-end optimizations and lazy expansions with sharing. Opening a path towards content representations other than plain (wiki-)text such as tokens, an AST or a DOM tree should be very useful for future parser development.

Gabriel

Tim Starling

2 May 2 May

2:01 a.m.

On 02/05/12 04:51, Gabriel Wicke wrote:

...

frame.args.name.expandTo( 'text/x-mediawiki' ) --- returns "value"

This would make it possible to work with other formats apart from wikitext.

I can see how that would make sense when you're writing a parser, but given the target audience for the Lua API, I think I would prefer to provide an abbreviated interface.

How about frame.args.name as an abbreviation for frame:getArgument('name'):expandTo( 'text/x-mediawiki' ) ?

And how about frame.plainArgs.name as an abbreviation for frame:getArgument('name'):expandTo( 'text/plain' ) ?

...

I recently added an API like this in Parsoid (the method is called 'as' there), and liked the way that worked out for parser functions. I am currently using the 'text/plain' type to retrieve a text expansion with comments etc stripped, and 'tokens/x-mediawiki' for expanded tokens (~list of tags and strings).

I know you're not really asking for a review of Parsoid and its interfaces, but I worry as to whether your use of text/plain to indicate wikitext with comments stripped is appropriate.

In MediaWiki, PPFrame::expand() takes any combination of 5 boolean flags, and modifies its behaviour based on which of the Parser's 4 output types is selected, so if you were to match it for flexibility, you would need 128 MIME types.

If I were to provide Lua with a richer interface to PPFrame::expand(), I would be inclined to support at least some of those flags via named options, rather than rolling them up into a single string parameter. So instead of expandTo( 'text/plain' ) we might have:

frame:getArgument('name'):expand{ expand_args = false, expand_templates = false, respect_noinclude = false, strip_comments = true }

Or, if forwards-compatibility requires that we don't support so many orthogonal options, some of the options could be rolled in together. The preceding could perhaps be written as:

frame:getArgument('name'):expand{ plain = true }

That doesn't preclude the use of overrides:

frame:getArgument('name'):expand{ plain = true, strip_comments = false }

But it does seem like a can of worms. How about providing getArgument(), which will return an opaque ParserValue object with a single method called expand(). This method would theoretically take named parameters, but currently, none are defined. With no parameters, it provides some kind of reasonable template-expanding behaviour. Then frame.args would provide an abbreviated syntax for expand() with no parameters.

If there is a compelling use case for "plain" expansion, then we would have to decide what options to PPFrame::expand() are needed to support that use case, and then we would need to decide how to map them to parameters to ParserValue.expand().

...

The conversion of wikitext or other formats to an opaque value object could be achieved using an object constructor:

--- 'value text' is parsed lazily ParserValue( 'text/x-mediawiki', 'value text', frame )

The frame might be the passed-in parent frame, or a custom one constructed with args assembled from other ParserValues.

Yes, this is an interesting idea. But I think I would prefer the factory to be a frame method rather than a global function. Also, again, I am skeptical about the value of using a MIME type. How about an interface allowing either:

frame:newParserValue( 'value text' )

or named arguments:

frame:newParserValue{ text = 'value text', fruitiness = 'high', }

...

Calls to existing templates could be supported with a convenient TemplateParserValue constructor, which does not specify how a template call is represented internally.

TemplateParserValue( 'tpl', args ).expandTo( 'text/plain' )

Yes, this is attractive, and could be done in the same way as ParserValue objects above. But I think there is still a need for an abbreviated interface:

frame:newTemplateParserValue{title = 'tpl', args = args}:expand()

abbreviated to:

frame:expandTemplate{title = 'tpl', args = args}

It doesn't just make the text shorter, it also reduces the number of concepts that the user has to understand before they are able to use the interface.

I know that adding such concepts gives greater flexibility, but an increase in the number of concepts will steepen the learning curve, and the terminology required to explain them risks being daunting. For example, if someone has never programmed before, you can't expect them to understand terms like "opaque object".

...

Finally, a ParserValue (or a list of those) could be used for the return type of functions to support output formats other than plain text.

Overall, I would love to keep the access to values as opaque as possible to enable back-end optimizations and lazy expansions with sharing. Opening a path towards content representations other than plain (wiki-)text such as tokens, an AST or a DOM tree should be very useful for future parser development.

For me, the main motivation behind providing a parallel "advanced interface" along the lines you suggest would be to establish a direction for future interface development.

Interfaces evolve mostly by analogy, so providing a well thought-out "advanced interface" will influence future development even if nobody ever uses it.

-- Tim Starling

Platonides

10:28 a.m.

Is it possible to hook Lua function calls? If so, I'd make a template expansion a "call" to a function with that name. That was the interface I envisioned when thinking how I'd do it if making the language from scratch to suit wikitext (I drafted some code, but didn't reach to a barely mature level).

Victor Vasiliev

11:04 a.m.

On Wed, May 2, 2012 at 2:28 PM, Platonides Platonides@gmail.com wrote:

...

Is it possible to hook Lua function calls? If so, I'd make a template expansion a "call" to a function with that name. That was the interface I envisioned when thinking how I'd do it if making the language from scratch to suit wikitext (I drafted some code, but didn't reach to a barely mature level).

Do you mean a situation when you have template X and call to function X is a transclusion of template X? I have also thought about that as well. Not all titles are legitimate Lua function names, but there are more serious issues with that.

This is close to how the first implementation of InlineScripts was done in 2009. As it turned out, this approach has numerous disadvantages, the main of which are performance issues (introducing overhauls by calling the functions through parser instead of direct call; this is actually a big problem when you have many function calls), inability to return non-string data (like arrays) and impossibility of exporting multiple functions from one template. That's why I strongly believe that all code should be in modules and modules should be interacting only through Lua itself.

—Victor

Gabriel Wicke

12:55 p.m.

On 05/02/2012 04:01 AM, Tim Starling wrote:

...

How about frame.args.name as an abbreviation for frame:getArgument('name'):expandTo( 'text/x-mediawiki' ) ?

Yep, that looks good to me. Maybe the specialized wikitext argument variant could be called 'wikitextArgs' so that the general variant can be used for ParserValues instead of the getArgument method?

...

And how about frame.plainArgs.name as an abbreviation for frame:getArgument('name'):expandTo( 'text/plain' ) ?

Adding more xxxArgs methods does not seem to scale that well, and would introduce a lot of extra method names to remember. It would also encourage users to pick one representation when they would not need to care about it, especially if they just pass through some content.

...

I know you're not really asking for a review of Parsoid and its interfaces, but I worry as to whether your use of text/plain to indicate wikitext with comments stripped is appropriate.

I am not that happy with the text/plain bit too. text/x-mediawiki with a separate progress or 'processing stage' component might be better, as it would not conflate processing stage with the format. I am using a numerical 'rank' value to track progress internally in Parsoid, but a string would likely be user-friendlier for an external API like this. There is an example for this further down in this post.

...

If I were to provide Lua with a richer interface to PPFrame::expand(), I would be inclined to support at least some of those flags via named options, rather than rolling them up into a single string parameter. So instead of expandTo( 'text/plain' ) we might have:

frame:getArgument('name'):expand{ expand_args = false, expand_templates = false, respect_noinclude = false, strip_comments = true }

Or, if forwards-compatibility requires that we don't support so many orthogonal options, some of the options could be rolled in together. The preceding could perhaps be written as:

frame:getArgument('name'):expand{ plain = true }

That doesn't preclude the use of overrides:

frame:getArgument('name'):expand{ plain = true, strip_comments = false }

But it does seem like a can of worms.

Some grepping through core and extension code left me with the impression that there are relatively few common sets of flags used. As an example, NO_ARGS and NO_TEMPLATES always seem to be used as a pair in situations where just comment and noinclude (and company) handling is needed.

If there remain use cases for fully orthogonal flags, then those could still be supported with optional (named) argument of course, as you note.

...

How about providing getArgument(), which will return an opaque ParserValue object with a single method called expand(). This method would theoretically take named parameters, but currently, none are defined. With no parameters, it provides some kind of reasonable template-expanding behaviour. Then frame.args would provide an abbreviated syntax for expand() with no parameters.

Yes, this looks very good to me. ParseValue with a heavily defaulted expand() method should be a good compromise between convenience for the currently common case without compromising the ability to work with other content types, or specify the processing stage through a name or flags. So the 'plain' example could look somewhat like this:

arg:expand( format = 'tokens/x-mediawiki', phase = '0.1_noComments' ) --- named processing phase

...

...
The conversion of wikitext or other formats to an opaque value object could be achieved using an object constructor:

--- 'value text' is parsed lazily ParserValue( 'text/x-mediawiki', 'value text', frame )

The frame might be the passed-in parent frame, or a custom one constructed with args assembled from other ParserValues.

Yes, this is an interesting idea. But I think I would prefer the factory to be a frame method rather than a global function.

I'd be happy with that too. Custom child frames could still be created using a frame:newChild method.

...

Also, again, I am skeptical about the value of using a MIME type. How about an interface allowing either:

frame:newParserValue( 'value text' )

or named arguments:

frame:newParserValue{ text = 'value text', fruitiness = 'high', }

+1, but I think some way to indicate the type and processing state of the passed-in value will be needed if non-wikitext values are to be supported. Further processing of a ParserValue depends on this knowledge. Optional (named) arguments could be employed for this too:

frame:newParserValue{ type = 'tokens/x-mediawiki', value = { { type = 'tag', name = 'a', attribs = { href = 'http://foo' } }, "Some link text", { type = 'endtag', name = 'a' } } }

Defaulting to wikitext and fully-preprocessed text for the processing phase would be fine with me. Any kind of type identifiers (MIME or not) could of course be used. MIME has the advantage of being somewhat known already, but other type identifiers might have other advantages.

...

frame:newTemplateParserValue{title = 'tpl', args = args}:expand()

abbreviated to:

frame:expandTemplate{title = 'tpl', args = args}

...

It doesn't just make the text shorter, it also reduces the number of concepts that the user has to understand before they are able to use the interface.

I know that adding such concepts gives greater flexibility, but an increase in the number of concepts will steepen the learning curve, and the terminology required to explain them risks being daunting. For example, if someone has never programmed before, you can't expect them to understand terms like "opaque object".

I pretty much agree. The abbreviated interface adds some complication by providing a second way to do the same thing. This still seems to be worth it if it helps people to get started.

Gabriel

4609

Age (days ago)

4610

Last active (days ago)

wikitech-l@lists.wikimedia.org

9 comments

4 participants

tags (0)

participants (4)

Gabriel Wicke
Platonides
Tim Starling
Victor Vasiliev