Brion Vibber wrote:
There's been talk of Lua as an embedded templating
language for a while,
and there's even an extension implementation.
One advantage of Lua over other languages is that its implementation is
optimized for use as an embedded language, and it looks kind of pretty.
An _inherent_ disadvantage is that it's a fairly rarely-used language,
so still requires special learning on potential template programmers' part.
An _implementation_ disadvantage is that it currently is dependent on an
external Lua binary installation -- something that probably won't be
present on third-party installs, meaning Lua templates couldn't be
easily copied to non-Wikimedia wikis.
There are problems with all the shell-based solutions. MediaWiki
callbacks, like template expansion, {{VARIABLES}} and ifexist, are
commonly used in templates on Wikipedia, and a scripting language
without these would suffer from poor community buy-in. You could
implement them from the shell using IPC, but IPC in PHP is rather
cumbersome. The interface between the parser and the scripting engine
would be performance-sensitive, because users would write templates
that invoked the scripting engine hundreds of times in the course of
rendering an article. So there's a case there for a persistent
scripting engine with a command-based interface over a pipe.
The reason I like Lua is because of the potential to embed it in PHP
as an extension, with fast setup and fast callbacks to MediaWiki. It
does all its memory allocation via a callback to the application,
including VM stack space, which means that it's possible to control
the memory usage without killing the process when the limit is
exceeded. But its standard library is unsuitable for running untrusted
scripts, since it contains all the usual process control and file
read/write functions.
The current PECL extension doesn't have any of the features that make
Lua attractive: it does not have support for callbacks to PHP, or for
replacing the standard library with something more sensible, or for
limiting memory without killing the request when the limit is
exceeded. Obviously the distributed standalone does not have these
features either.
I had imagined the task of embedding Lua in MediaWiki as being
primarily a C project, writing the necessary glue code between the
embedded interpreter and PHP. I had hoped that banging the drum for
Lua might encourage someone to look at these issues and start work on
that project.
* PHP
Advantage: Lots of webbish people have some experience with PHP or can
easily find references.
Advantage: we're pretty much guaranteed to have a PHP interpreter
available. :)
Disadvantage: PHP is difficult to lock down for secure execution.
PHP can be secured against arbitrary execution using token_get_all(),
there's a proof-of-principle validator of this kind in the master
switch script project. But there are problems with attempting a
single-process PHP-in-PHP sandbox:
* The poor support for signals in PHP makes it difficult to limit the
execution time of a script snippet. Ticks only occur at the end of
each statement, so you can defeat them by making a single statement
that runs forever.
* Apart from blacklisting function definition, there is no way to
protect against infinite recursion, which exhausts the process stack
and causes a segfault.
* Memory limits are implemented on a per-request basis, and there's no
way to recover from exceeding the memory limit, the request is just
killed.
* JavaScript
Advantage: Even more folks have been exposed to JavaScript programming,
including Wikipedia power-users.
Disadvantage: Server-side interpreter not guaranteed to be present. Like
Lua, would either restrict our portability or would require an
interpreter reimplementation. :P
* Python
Advantage: A Python interpreter will be present on most web servers,
though not necessarily all. (Windows-based servers especially.)
Wash: Python is probably better known than Lua, but not as well as PHP
or JS.
Disadvantage: Like PHP, Python is difficult to lock down securely.
Any thoughts? Does anybody happen to have a PHP implementation of a Lua
or JavaScript interpreter? ;)
SpiderMonkey and Python both lack control over memory usage. Python
lacks a sandbox mode, the rexec module has been removed. SpiderMonkey
isn't embedded in any useful kind of standalone, so you'd have to
start with a C development project, like you would for Lua.
I think Rhino would be an easier path to JavaScript execution than
SpiderMonkey. You can pass an -Xmx option to the java VM, and it'll
throw an OutOfMemory exception when it hits that limit, allowing you
to implement per-snippet memory limits without killing the
interpreter. You could do wall-clock time limits using
java.util.Timer, or CPU time limits using a JNI hack to poll clock().
You could turn off LiveConnect by making your own ClassShutter,
leaving what (on initial impressions) is a reasonably secure sandbox.
You'd still need an interface between Java and PHP, but presumably
that's a well-studied problem.
Running scripts in the Java VM has the advantage that you don't have
to rely on the security of the collection of amateurish C code that is
PHP. Remember those PCRE crash bugs that went unfixed for years,
before someone finally demonstrated elevation to arbitrary execution?
At a conference, I overheard Rasmus Lerdorf quip that really PHP is
pretty secure, since most of the demonstrated buffer/integer/heap
overflows needed arbitrary script access to exploit, and if the
attacker has that then you're screwed anyway.
-- Tim Starling