At the moment, in the Lua support extension we have been developing, wikitext is output to the wiki via the return value of a function. For example in wikitext you would have:
{{#invoke:MyModule|myFunction}}
Then in [[Module:MyModule]]:
local p = {} function p.myFunction() return 'Hello, world!' end return p
This is all nice and elegant and will work. There is an alternative convention commonly used in scripting languages (and programming in general for that matter), using a print function:
local p = {} function p.myFunction() print('Hello, world!') end return p
I would have been happy to leave it as Victor Vasiliev made it, i.e. using return values, but I happened across a performance edge case in Lua which made me think about it. Specifically, this:
function foo(n) s = '' for i = 1, n do s = s .. toString(i) end return s end
has O(n^2) running time. For 100,000 iterations it takes 5 seconds on my laptop. Apparently this is because strings are immutable, so the accumulator needs to be copied for each concatenation. It's very similar to the situation in Java, where a StringBuffer needs to be used in such an algorithm.
It's easy enough to work around, but the problem is obscure enough that I think probably most of our users will not realise they need to work around it until it becomes severe.
It would be possible to provide a print() function which does not suffer from this problem, i.e.
function foo(n) for i = 1, n do print(i) end end
could run in O(n log(n)) time. Intuitively, I would expect that providing such a print function would encourage a programming style which would avoid at least some instances of repetitive concatenation.
The performance issue is probably no big deal, since most templates are probably not going to be concatenating hundreds of thousands of strings, and 5 seconds is still quicker than the time it takes most of our featured articles to render at the moment. But like I say, it got me thinking about it.
Does anyone have any thoughts on return versus print generally? Are there other reasons we would choose one over the other?
-- Tim Starling
I have no knowledge of Lua, but I don't see what is problem with print here, the function print is supposed to print output to output device in most of programming languages, just as in this case, so I don't understand why we should want to use return (which is supposed to return some data / pointer back to function it was called from) in this case? I mean if we can pick if we should use print or return as recommended way to print text, I would vote for print(), especially if it has better performance that the implementation using return.
On Fri, Apr 13, 2012 at 1:45 PM, Tim Starling tstarling@wikimedia.org wrote:
At the moment, in the Lua support extension we have been developing, wikitext is output to the wiki via the return value of a function. For example in wikitext you would have:
{{#invoke:MyModule|myFunction}}
Then in [[Module:MyModule]]:
local p = {} function p.myFunction() return 'Hello, world!' end return p
This is all nice and elegant and will work. There is an alternative convention commonly used in scripting languages (and programming in general for that matter), using a print function:
local p = {} function p.myFunction() print('Hello, world!') end return p
I would have been happy to leave it as Victor Vasiliev made it, i.e. using return values, but I happened across a performance edge case in Lua which made me think about it. Specifically, this:
function foo(n) s = '' for i = 1, n do s = s .. toString(i) end return s end
has O(n^2) running time. For 100,000 iterations it takes 5 seconds on my laptop. Apparently this is because strings are immutable, so the accumulator needs to be copied for each concatenation. It's very similar to the situation in Java, where a StringBuffer needs to be used in such an algorithm.
It's easy enough to work around, but the problem is obscure enough that I think probably most of our users will not realise they need to work around it until it becomes severe.
It would be possible to provide a print() function which does not suffer from this problem, i.e.
function foo(n) for i = 1, n do print(i) end end
could run in O(n log(n)) time. Intuitively, I would expect that providing such a print function would encourage a programming style which would avoid at least some instances of repetitive concatenation.
The performance issue is probably no big deal, since most templates are probably not going to be concatenating hundreds of thousands of strings, and 5 seconds is still quicker than the time it takes most of our featured articles to render at the moment. But like I say, it got me thinking about it.
Does anyone have any thoughts on return versus print generally? Are there other reasons we would choose one over the other?
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Tim, is there any code publicly available for the new extension you talk about? I would like to see it, if it exist (and isn't anything secret).
On Fri, Apr 13, 2012 at 2:12 PM, Petr Bena benapetr@gmail.com wrote:
I have no knowledge of Lua, but I don't see what is problem with print here, the function print is supposed to print output to output device in most of programming languages, just as in this case, so I don't understand why we should want to use return (which is supposed to return some data / pointer back to function it was called from) in this case? I mean if we can pick if we should use print or return as recommended way to print text, I would vote for print(), especially if it has better performance that the implementation using return.
On Fri, Apr 13, 2012 at 1:45 PM, Tim Starling tstarling@wikimedia.org wrote:
At the moment, in the Lua support extension we have been developing, wikitext is output to the wiki via the return value of a function. For example in wikitext you would have:
{{#invoke:MyModule|myFunction}}
Then in [[Module:MyModule]]:
local p = {} function p.myFunction() return 'Hello, world!' end return p
This is all nice and elegant and will work. There is an alternative convention commonly used in scripting languages (and programming in general for that matter), using a print function:
local p = {} function p.myFunction() print('Hello, world!') end return p
I would have been happy to leave it as Victor Vasiliev made it, i.e. using return values, but I happened across a performance edge case in Lua which made me think about it. Specifically, this:
function foo(n) s = '' for i = 1, n do s = s .. toString(i) end return s end
has O(n^2) running time. For 100,000 iterations it takes 5 seconds on my laptop. Apparently this is because strings are immutable, so the accumulator needs to be copied for each concatenation. It's very similar to the situation in Java, where a StringBuffer needs to be used in such an algorithm.
It's easy enough to work around, but the problem is obscure enough that I think probably most of our users will not realise they need to work around it until it becomes severe.
It would be possible to provide a print() function which does not suffer from this problem, i.e.
function foo(n) for i = 1, n do print(i) end end
could run in O(n log(n)) time. Intuitively, I would expect that providing such a print function would encourage a programming style which would avoid at least some instances of repetitive concatenation.
The performance issue is probably no big deal, since most templates are probably not going to be concatenating hundreds of thousands of strings, and 5 seconds is still quicker than the time it takes most of our featured articles to render at the moment. But like I say, it got me thinking about it.
Does anyone have any thoughts on return versus print generally? Are there other reasons we would choose one over the other?
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 13/04/12 22:19, Petr Bena wrote:
Tim, is there any code publicly available for the new extension you talk about? I would like to see it, if it exist (and isn't anything secret).
Yes, it is the Scribunto extension in Git. You can get the latest version with:
git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Scribunto
cd Scribunto git fetch origin refs/changes/52/4852/1 git checkout FETCH_HEAD
There is also a supporting extension written in C called LuaSandbox, it is in /trunk/php/luasandbox in Subversion.
-- Tim Starling
On 13 April 2012 13:12, Petr Bena benapetr@gmail.com wrote:
I have no knowledge of Lua, but I don't see what is problem with print here, the function print is supposed to print output to output device in most of programming languages, just as in this case, so I don't understand why we should want to use return (which is supposed to return some data / pointer back to function it was called from) in this case? I mean if we can pick if we should use print or return as recommended way to print text, I would vote for print(), especially if it has better performance that the implementation using return.
On Fri, Apr 13, 2012 at 1:45 PM, Tim Starling tstarling@wikimedia.org wrote:
At the moment, in the Lua support extension we have been developing, wikitext is output to the wiki via the return value of a function. For example in wikitext you would have:
{{#invoke:MyModule|myFunction}}
Then in [[Module:MyModule]]:
local p = {} function p.myFunction() return 'Hello, world!' end return p
This is all nice and elegant and will work. There is an alternative convention commonly used in scripting languages (and programming in general for that matter), using a print function:
local p = {} function p.myFunction() print('Hello, world!') end return p
I would have been happy to leave it as Victor Vasiliev made it, i.e. using return values, but I happened across a performance edge case in Lua which made me think about it. Specifically, this:
function foo(n) s = '' for i = 1, n do s = s .. toString(i) end return s end
has O(n^2) running time. For 100,000 iterations it takes 5 seconds on my laptop. Apparently this is because strings are immutable, so the accumulator needs to be copied for each concatenation. It's very similar to the situation in Java, where a StringBuffer needs to be used in such an algorithm.
It's easy enough to work around, but the problem is obscure enough that I think probably most of our users will not realise they need to work around it until it becomes severe.
It would be possible to provide a print() function which does not suffer from this problem, i.e.
function foo(n) for i = 1, n do print(i) end end
could run in O(n log(n)) time. Intuitively, I would expect that providing such a print function would encourage a programming style which would avoid at least some instances of repetitive concatenation.
The performance issue is probably no big deal, since most templates are probably not going to be concatenating hundreds of thousands of strings, and 5 seconds is still quicker than the time it takes most of our featured articles to render at the moment. But like I say, it got me thinking about it.
Does anyone have any thoughts on return versus print generally? Are there other reasons we would choose one over the other?
-- Tim Starling
I don't see a problem with supporting both. Considering that you don't want the return value of *every *function to always be printed, just the return value of the function directly called by #invoke, you can just document #invoke as implicitly translating to "print( foo() )". Is there an equivalent parser tag which does *not* print output? That would make the parallel even clearer.
Having a print() function would be very useful for debugging; you could turn 'debug mode' on on sandbox pages with an input arg (I assume #invoke and friends can take arguments?) and something like {{#invoke:MyModule|MyFunction|debug={{#ifeq:{{SUBPAGENAME}}|Sandbox|true|false}}}}, and output debugging data with more flexibility if you had a second channel for printing.
Separately, it would be awesome to have some sort of 'intellisense' hinting for potential pitfalls like this. I've recently been doing a lot of work in MATLAB, and it has a really effective hinter that warns you, for example, when you change the size of a matrix inside a loop and encourages you to predefine it, which is a similar concept. I assume an editor with syntax highlighting etc is somewhere on the development roadmap, albeit probably fairly low down, so I guess add this as an even lower priority!
--HM
On 13.04.2012 16:12, Petr Bena wrote:
I have no knowledge of Lua, but I don't see what is problem with print here, the function print is supposed to print output to output device in most of programming languages, just as in this case, so I don't understand why we should want to use return (which is supposed to return some data / pointer back to function it was called from) in this case? I mean if we can pick if we should use print or return as recommended way to print text, I would vote for print(), especially if it has better performance that the implementation using return.
output buffer has to be "catched", while return value may be more complex than just a text, being processed via API or another way. Not all of the scripts should generate plain text. My extension will need nested arrays (or simple objects) to process, some of another extensions probably too. Dmitriy
On 13 April 2012 13:45, Tim Starling tstarling@wikimedia.org wrote:
At the moment, in the Lua support extension we have been developing, wikitext is output to the wiki via the return value of a function. For example in wikitext you would have:
{{#invoke:MyModule|myFunction}}
Then in [[Module:MyModule]]:
local p = {} function p.myFunction() return 'Hello, world!' end return p
..
Does anyone have any thoughts on return versus print generally? Are there other reasons we would choose one over the other?
-- Tim Starling
Functions that return a value are chain-able. I suppose this is true in LUA too.
$int = function($txt){ return parseInt($txt,10); };
$hats = function($numHats){ return " We have $numHats excellents hats! "; };
echo $hats( $int("4123,234") );
Perhaps this make functions that return a string slightly better.
Does anyone have any thoughts on return versus print generally? Are there other reasons we would choose one over the other?
From a language perspective, I would much prefer return values instead
of side effects, even if those side effects could be converted into a return value with a special print implementation.
People tend to expect print to produce a visible output in any case, which will often be violated if the output is collected and then processed further by other constructs the Lua call is wrapped in. Having both would also bring up the question what to do when both are provided- should the return value be appended to the collected printed output?
I am no Lua expert, but would guess that the usual collect-in-list-and--finally-join method can avoid the performance penalty in Lua too.
Gabriel
On 13/04/12 16:19, Gabriel Wicke wrote:
Does anyone have any thoughts on return versus print generally? Are there other reasons we would choose one over the other?
From a language perspective, I would much prefer return values instead of side effects, even if those side effects could be converted into a return value with a special print implementation.
I'd also prefer return values. Fits better with wikitext in general.
+1 to all the points for using return values.
If we have to implement an output buffer in Lua, we have probably failed. Output buffering is is messy and prone to error. It's certainly not a good design from a usability standpoint, and it's generally messy to deal with.
Template invocations should be the equivalent to calling a pure function.
- Trevor
On Fri, Apr 13, 2012 at 9:31 AM, Platonides Platonides@gmail.com wrote:
On 13/04/12 16:19, Gabriel Wicke wrote:
Does anyone have any thoughts on return versus print generally? Are there other reasons we would choose one over the other?
From a language perspective, I would much prefer return values instead of side effects, even if those side effects could be converted into a return value with a special print implementation.
I'd also prefer return values. Fits better with wikitext in general.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Trevor Parscal tparscal@wikimedia.org wrote:
+1 to all the points for using return values.
Zope has a nice solution here:
print "Asdsds"
prints actually to the internal magic variable "printed" which has to be returned later with
return printed
if it's going to end up as the function result.
Not sure if this is possible in Lua.
//Saper
On Sun, Apr 15, 2012 at 5:40 PM, Marcin Cieslak saper@saper.info wrote:
Trevor Parscal tparscal@wikimedia.org wrote:
+1 to all the points for using return values.
Zope has a nice solution here:
print "Asdsds"
prints actually to the internal magic variable "printed" which has to be returned later with
return printed
if it's going to end up as the function result.
Not sure if this is possible in Lua.
//Saper
This might be possible, but I would still prefer only supporting return. The advantage here is that it needs an explicit return. The disadvantage is that it needs magic, making it harder to understand and less clean in design, and you would still have to chase all print calls in nested functions to know what is happening, and where output is created.
Martijn
On Fri, Apr 13, 2012 at 7:19 AM, Gabriel Wicke wicke@wikidev.net wrote:
From a language perspective, I would much prefer return values instead of side effects, even if those side effects could be converted into a return value with a special print implementation.
I think I agree with Gabriel here (and looks like the quickly forming consensus). More reasons why this seems like the right choice: 1. We should be conservative in what we initially support, and only add more if we need it. Return values are the most general solution which we're almost certainly going to need no matter what, whereas output via print is an optimization. 2. We should make this environment one that is fun for good programmers to write clear code, so as to attract good programmers and encourage collaboration, and make everyone feel like learning how our system works has applicability in other parts of their lives. Side effects are a paving stone toward tangled single-programmer write-only code. 3. Premature optimization is the root of all evil.
I am no Lua expert, but would guess that the usual collect-in-list-and--finally-join method can avoid the performance penalty in Lua too.
If this type of technique works and becomes important, we can probably introduce patterns and possibly helper functions to make the easy default choice. I imagine there's going to be a lot of cut-and-paste going on, so if we can establish best practices early (keeping a close eye on how it's being used), we can introduce some good genetic stock into future Lua scripts.
Rob
On 13/04/12 19:18, Rob Lanphier wrote:
I imagine there's going to be a lot of cut-and-paste going on, so if we can establish best practices early (keeping a close eye on how it's being used), we can introduce some good genetic stock into future Lua scripts.
Imagine my shock when I first read it as "introduce genetic algorithms in wikipedia templates"
+1 for only supporting return values. The case where one needs thousands of string concats is pretty rare and can be worked around, and seems like the only reasonable argument in favour of using print.
One exit point per function is much easier to read than juggling possible scattered print statements, that may be hidden in functions deeper down the chain. The possibility of print getting abused is just far too great.
On Fri, Apr 13, 2012 at 8:39 PM, Platonides Platonides@gmail.com wrote:
On 13/04/12 19:18, Rob Lanphier wrote:
I imagine there's going to be a lot of cut-and-paste going on, so if we can establish best practices early (keeping a close eye on how it's being used), we can introduce some good genetic stock into future Lua scripts.
Imagine my shock when I first read it as "introduce genetic algorithms in wikipedia templates"
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 14/04/12 00:19, Gabriel Wicke wrote:
I am no Lua expert, but would guess that the usual collect-in-list-and--finally-join method can avoid the performance penalty in Lua too.
Yes, it works. With short strings, the memory overhead of the table elements can become excessive, so it makes sense to concatenate the strings in batches of say 1000. I imagine we would have an object-based interface, like:
local buf = mw.StringBuffer:new() for i = 1,n do buf:add('blah') end return buf:toString()
It may also be possible to use metamethods (which are similar to C++ operator overloading) to make a StringBuffer object behave like a plain string in certain contexts.
-- Tim Starling
On Fri, Apr 13, 2012 at 09:45:52PM +1000, Tim Starling wrote:
Does anyone have any thoughts on return versus print generally?
If you do go with the print solution, someone will eventually request something along the lines of PHP's output buffering functions[1] so they don't have to rewrite function A to use return instead of print when function B needs to somehow postprocess A's output. I'd guess that just ob_start() with no arguments, ob_get_clean(), and maybe ob_end_flush() would probably serve the majority.
And then they will probably use it for efficient string concatenation, if standard Lua string concatenation is really that inefficient.
Thanks for your comments everyone. I'll stick with return.
On 13/04/12 22:12, Petr Bena wrote:
I have no knowledge of Lua, but I don't see what is problem with print here, the function print is supposed to print output to output device in most of programming languages, just as in this case, so I don't understand why we should want to use return [...]
Platonides' response hints at the answer:
On 14/04/12 02:31, Platonides wrote:
I'd also prefer return values. Fits better with wikitext in general.
Most programming environments allow progressive output. You can call print, do some processing for a few seconds, ask for some user input on stdin, then do another print. When we embed a script in wikitext, its output is required to be fully buffered. So the advantages of allowing print are substantially less.
There are disadvantages: it is more difficult to identify the data flow when you use print(), as several people have said.
MediaWiki has $wgOut->addHTML() which is kind of like print(), despite being in a fully buffered environment, but its effect is consistent. By contrast, the output from a parser function can be modified by other parser functions and templates.
On 13/04/12 22:33, Happy Melon wrote:
Having a print() function would be very useful for debugging; you could turn 'debug mode' on on sandbox pages with an input arg (I assume #invoke and friends can take arguments?) and something like {{#invoke:MyModule|MyFunction|debug={{#ifeq:{{SUBPAGENAME}}|Sandbox|true|false}}}}, and output debugging data with more flexibility if you had a second channel for printing.
I think it's probably best if we have separate support for debug messages -- something that can be used without changing the output. Maybe an mw.log() function like we have in JavaScript.
-- Tim Starling
wikitech-l@lists.wikimedia.org