On Tue, Jun 30, 2009 at 12:16 PM, Brion Vibber<brion(a)wikimedia.org> wrote:
Advantage: Lots of webbish people have some experience with PHP or can
easily find references.
Advantage: we're pretty much guaranteed to have a PHP interpreter
Disadvantage: PHP is difficult to lock down for secure execution.
I think it would be easy to provide a very simple locked-down version,
with most of the features gone. You could, for instance, only permit
variable assignment, use of built-in operators, a small whitelist of
functions, and conditionals. You could omit loops, function
definitions, and abusable functions like str_repeat() (let alone
exec(), eval(), etc.) from a first pass. This would still be vastly
more powerful, more readable, and faster than ParserFunctions.
Hopefully, we could make this secure enough for your average
shared-host website to run it by default with no special measures
taken and without much risk. Installations with more access and
higher security requirements, like Wikimedia, could shell out to a
process that's sandboxed on the OS level to be on the safe side. I'd
like to hear what Tim thinks about the possibility of securing PHP
Of course, PHP is evil, and supporting it sucks. :( But if we
*really* *really* need to support users who can't shell out to other
programs, I think it's the only real language that's a feasible
I'd encourage you to consider requiring exec() support for full use of
Wikipedia templates, though. Many really big shared hosts allow it,
. Anyone big enough to include much Wikipedia content
will likely be on at least a VPS anyway. And if your host doesn't
support exec(), then at *worst* you can still get the articles in a
totally usable form -- just run Special:ExpandTemplates on all the
article's templates. You can then transclude those on a per-article
basis; we could update Special:Export to make this easier. The only
problem in this case would be that you can't easily change the
formatting of all the templates at once -- but such a small site would
likely have few enough articles to do it by hand, if they even want
I think saying that users without exec() support get to use Wikipedia
content in a somewhat less usable form would be just fine, and it
would *really* open up our options. We could support basically any
programming language in that case.
Advantage: A Python interpreter will be present on most web servers,
though not necessarily all. (Windows-based servers especially.)
Wash: Python is probably better known than Lua, but not as well as PHP
Disadvantage: Like PHP, Python is difficult to lock down securely.
It doesn't matter whether it's present, does it? If the user has
exec() support, they could download a binary interpreter for *any*
language to their webspace and run it from there regardless of whether
the language is supported on the host. So Python is on exactly the
same level as Lua here.
Much though I love Python, Lua looks like the better option. First of
all, it's *very* small. sudo apt-get install lua50 on my machine uses
up only 180 KB of disk space, and the package is 30 KB gzipped. Our
current tarballs are 10 MB; we could easily just chuck in Lua binaries
for Linux x86-32 and Windows without even noticing the size increase,
and allow users to enable it with one line in LocalSettings.php. By
contrast, python2.6 is around 10 MB uncompressed, 2.5 MB compressed.
Perl is twice that size. Windows users, or users with exec() allowed
but open_basedir preventing access to /usr/bin, would have to obtain
It looks to me like Lua would be a lot easier to sandbox. It seems
pretty simple to deny all I/O within the language itself, so you'd
(hopefully) just need memory and CPU limits. Both of those could be
implemented on Linux with hard setrlimit() values plus nice. Similar
things exist on Windows, hopefully accessible by command line somehow.
If we're shipping binaries with MediaWiki, we could even hack the
code if necessary, to use whatever sandboxing mechanisms the OS makes
available, although hopefully that would be unneeded.
I don't think we should fixate too much on how many people know the
language. It's not hard to pick up a new language if you already know
one, and Lua has the reputation of being simple (although I haven't
tried to learn it). I think Lua is the best option here.