Re: [Wikitech-l] Lua: parser interface

1 May 2012


      On 02/05/12 04:51, Gabriel Wicke wrote:
...
frame.args.name.expandTo( 'text/x-mediawiki' ) --- returns "value"
This would make it possible to work with other formats apart from wikitext.
I can see how that would make sense when you're writing a parser, but
given the target audience for the Lua API, I think I would prefer to
provide an abbreviated interface.
How about frame.args.name as an abbreviation for
frame:getArgument('name'):expandTo( 'text/x-mediawiki' ) ?
And how about frame.plainArgs.name as an abbreviation for
frame:getArgument('name'):expandTo( 'text/plain' ) ?
...
I recently added an API like this in Parsoid (the method is called 'as'
there), and liked the way that worked out for parser functions. I am
currently using the 'text/plain' type to retrieve a text expansion with
comments etc stripped, and 'tokens/x-mediawiki' for expanded tokens
(~list of tags and strings).
I know you're not really asking for a review of Parsoid and its
interfaces, but I worry as to whether your use of text/plain to
indicate wikitext with comments stripped is appropriate.
In MediaWiki, PPFrame::expand() takes any combination of 5 boolean
flags, and modifies its behaviour based on which of the Parser's 4
output types is selected, so if you were to match it for flexibility,
you would need 128 MIME types.
If I were to provide Lua with a richer interface to PPFrame::expand(),
I would be inclined to support at least some of those flags via named
options, rather than rolling them up into a single string parameter.
So instead of expandTo( 'text/plain' ) we might have:
frame:getArgument('name'):expand{
   expand_args = false,
   expand_templates = false,
   respect_noinclude = false,
   strip_comments = true }
Or, if forwards-compatibility requires that we don't support so many
orthogonal options, some of the options could be rolled in together.
The preceding could perhaps be written as:
frame:getArgument('name'):expand{ plain = true }
That doesn't preclude the use of overrides:
frame:getArgument('name'):expand{
   plain = true,
   strip_comments = false }
But it does seem like a can of worms. How about providing
getArgument(), which will return an opaque ParserValue object with a
single method called expand(). This method would theoretically take
named parameters, but currently, none are defined. With no parameters,
it provides some kind of reasonable template-expanding behaviour. Then
frame.args would provide an abbreviated syntax for expand() with no
parameters.
If there is a compelling use case for "plain" expansion, then we would
have to decide what options to PPFrame::expand() are needed to support
that use case, and then we would need to decide how to map them to
parameters to ParserValue.expand().
...
The conversion of wikitext or other formats to an opaque value object
could be achieved using an object constructor:
--- 'value text' is parsed lazily
  ParserValue( 'text/x-mediawiki', 'value text', frame )
The frame might be the passed-in parent frame, or a custom one
constructed with args assembled from other ParserValues.
Yes, this is an interesting idea. But I think I would prefer the
factory to be a frame method rather than a global function. Also,
again, I am skeptical about the value of using a MIME type. How about
an interface allowing either:
frame:newParserValue( 'value text' )
or named arguments:
frame:newParserValue{
   text = 'value text',
   fruitiness = 'high',
}
...
Calls to existing templates could be supported with a convenient
TemplateParserValue constructor, which does not specify how a template
call is represented internally.
TemplateParserValue( 'tpl', args ).expandTo( 'text/plain' )
Yes, this is attractive, and could be done in the same way as
ParserValue objects above. But I think there is still a need for an
abbreviated interface:
frame:newTemplateParserValue{title = 'tpl', args = args}:expand()
abbreviated to:
frame:expandTemplate{title = 'tpl', args = args}
It doesn't just make the text shorter, it also reduces the number of
concepts that the user has to understand before they are able to use
the interface.
I know that adding such concepts gives greater flexibility, but an
increase in the number of concepts will steepen the learning curve,
and the terminology required to explain them risks being daunting. For
example, if someone has never programmed before, you can't expect them
to understand terms like "opaque object".
...
Finally, a ParserValue (or a list of those) could be used for the return
type of functions to support output formats other than plain text.
Overall, I would love to keep the access to values as opaque as possible
to enable back-end optimizations and lazy expansions with sharing.
Opening a path towards content representations other than plain
(wiki-)text such as tokens, an AST or a DOM tree should be very useful
for future parser development.
For me, the main motivation behind providing a parallel "advanced
interface" along the lines you suggest would be to establish a
direction for future interface development.
Interfaces evolve mostly by analogy, so providing a well thought-out
"advanced interface" will influence future development even if nobody
ever uses it.
-- Tim Starling

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Lua: parser interface