Brion Vibber wrote:
There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.
One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.
An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.
An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.
There are problems with all the shell-based solutions. MediaWiki callbacks, like template expansion, {{VARIABLES}} and ifexist, are commonly used in templates on Wikipedia, and a scripting language without these would suffer from poor community buy-in. You could implement them from the shell using IPC, but IPC in PHP is rather cumbersome. The interface between the parser and the scripting engine would be performance-sensitive, because users would write templates that invoked the scripting engine hundreds of times in the course of rendering an article. So there's a case there for a persistent scripting engine with a command-based interface over a pipe.
The reason I like Lua is because of the potential to embed it in PHP as an extension, with fast setup and fast callbacks to MediaWiki. It does all its memory allocation via a callback to the application, including VM stack space, which means that it's possible to control the memory usage without killing the process when the limit is exceeded. But its standard library is unsuitable for running untrusted scripts, since it contains all the usual process control and file read/write functions.
The current PECL extension doesn't have any of the features that make Lua attractive: it does not have support for callbacks to PHP, or for replacing the standard library with something more sensible, or for limiting memory without killing the request when the limit is exceeded. Obviously the distributed standalone does not have these features either.
I had imagined the task of embedding Lua in MediaWiki as being primarily a C project, writing the necessary glue code between the embedded interpreter and PHP. I had hoped that banging the drum for Lua might encourage someone to look at these issues and start work on that project.
- PHP
Advantage: Lots of webbish people have some experience with PHP or can easily find references.
Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)
Disadvantage: PHP is difficult to lock down for secure execution.
PHP can be secured against arbitrary execution using token_get_all(), there's a proof-of-principle validator of this kind in the master switch script project. But there are problems with attempting a single-process PHP-in-PHP sandbox:
* The poor support for signals in PHP makes it difficult to limit the execution time of a script snippet. Ticks only occur at the end of each statement, so you can defeat them by making a single statement that runs forever.
* Apart from blacklisting function definition, there is no way to protect against infinite recursion, which exhausts the process stack and causes a segfault.
* Memory limits are implemented on a per-request basis, and there's no way to recover from exceeding the memory limit, the request is just killed.
- JavaScript
Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.
Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P
- Python
Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)
Wash: Python is probably better known than Lua, but not as well as PHP or JS.
Disadvantage: Like PHP, Python is difficult to lock down securely.
Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)
SpiderMonkey and Python both lack control over memory usage. Python lacks a sandbox mode, the rexec module has been removed. SpiderMonkey isn't embedded in any useful kind of standalone, so you'd have to start with a C development project, like you would for Lua.
I think Rhino would be an easier path to JavaScript execution than SpiderMonkey. You can pass an -Xmx option to the java VM, and it'll throw an OutOfMemory exception when it hits that limit, allowing you to implement per-snippet memory limits without killing the interpreter. You could do wall-clock time limits using java.util.Timer, or CPU time limits using a JNI hack to poll clock(). You could turn off LiveConnect by making your own ClassShutter, leaving what (on initial impressions) is a reasonably secure sandbox. You'd still need an interface between Java and PHP, but presumably that's a well-studied problem.
Running scripts in the Java VM has the advantage that you don't have to rely on the security of the collection of amateurish C code that is PHP. Remember those PCRE crash bugs that went unfixed for years, before someone finally demonstrated elevation to arbitrary execution? At a conference, I overheard Rasmus Lerdorf quip that really PHP is pretty secure, since most of the demonstrated buffer/integer/heap overflows needed arbitrary script access to exploit, and if the attacker has that then you're screwed anyway.
-- Tim Starling