On Tue, Jun 30, 2009 at 12:56 PM, Aryeh
Gregor<Simetrical+wikilist(a)gmail.com> wrote:
On Tue, Jun 30, 2009 at 12:16 PM, Brion
Vibber<brion(a)wikimedia.org> wrote:
* PHP
Advantage: Lots of webbish people have some experience with PHP or can
easily find references.
Advantage: we're pretty much guaranteed to have a PHP interpreter
available. :)
Disadvantage: PHP is difficult to lock down for secure execution.
I think it would be easy to provide a very simple locked-down version,
with most of the features gone. You could, for instance, only permit
variable assignment, use of built-in operators, a small whitelist of
functions, and conditionals. You could omit loops, function
definitions, and abusable functions like str_repeat() (let alone
exec(), eval(), etc.) from a first pass. This would still be vastly
more powerful, more readable, and faster than ParserFunctions.
Hopefully, we could make this secure enough for your average
shared-host website to run it by default with no special measures
taken and without much risk. Installations with more access and
higher security requirements, like Wikimedia, could shell out to a
process that's sandboxed on the OS level to be on the safe side. I'd
like to hear what Tim thinks about the possibility of securing PHP
like this.
Of course, PHP is evil, and supporting it sucks. :( But if we
*really* *really* need to support users who can't shell out to other
programs, I think it's the only real language that's a feasible
solution.
I'd encourage you to consider requiring exec() support for full use of
Wikipedia templates, though. Many really big shared hosts allow it,
like
1and1.com. Anyone big enough to include much Wikipedia content
will likely be on at least a VPS anyway. And if your host doesn't
support exec(), then at *worst* you can still get the articles in a
totally usable form -- just run Special:ExpandTemplates on all the
article's templates. You can then transclude those on a per-article
basis; we could update Special:Export to make this easier. The only
problem in this case would be that you can't easily change the
formatting of all the templates at once -- but such a small site would
likely have few enough articles to do it by hand, if they even want
to.
I think saying that users without exec() support get to use Wikipedia
content in a somewhat less usable form would be just fine, and it
would *really* open up our options. We could support basically any
programming language in that case.
* Python
Advantage: A Python interpreter will be present on most web servers,
though not necessarily all. (Windows-based servers especially.)
Wash: Python is probably better known than Lua, but not as well as PHP
or JS.
Disadvantage: Like PHP, Python is difficult to lock down securely.
It doesn't matter whether it's present, does it? If the user has
exec() support, they could download a binary interpreter for *any*
language to their webspace and run it from there regardless of whether
the language is supported on the host. So Python is on exactly the
same level as Lua here.
Much though I love Python, Lua looks like the better option. First of
all, it's *very* small. sudo apt-get install lua50 on my machine uses
up only 180 KB of disk space, and the package is 30 KB gzipped. Our
current tarballs are 10 MB; we could easily just chuck in Lua binaries
for Linux x86-32 and Windows without even noticing the size increase,
and allow users to enable it with one line in LocalSettings.php. By
contrast, python2.6 is around 10 MB uncompressed, 2.5 MB compressed.
Perl is twice that size. Windows users, or users with exec() allowed
but open_basedir preventing access to /usr/bin, would have to obtain
Python/Perl/etc. themselves.
It looks to me like Lua would be a lot easier to sandbox. It seems
pretty simple to deny all I/O within the language itself, so you'd
(hopefully) just need memory and CPU limits. Both of those could be
implemented on Linux with hard setrlimit() values plus nice. Similar
things exist on Windows, hopefully accessible by command line somehow.
If we're shipping binaries with MediaWiki, we could even hack the
code if necessary, to use whatever sandboxing mechanisms the OS makes
available, although hopefully that would be unneeded.
I don't think we should fixate too much on how many people know the
language. It's not hard to pick up a new language if you already know
one, and Lua has the reputation of being simple (although I haven't
tried to learn it). I think Lua is the best option here.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
In addition to resource limits, any scheme better make sure what's
passed into the programming language and what's passed out makes
sense. For example, you shouldn't have it generating raw HTML and
probably shouldn't let it mess with strip markers. Some of this may
be automatic depending how it's integrated into the parser. One would
probably also want to limit the size of an allowed output (e.g. don't
let it send 5 MB to the user). Depending on the integration there may
be other control sequences that one needs to catch when it returns as
well.
On a separate point, one of the limitations of stand-alone type
sandboxes is that it would make it harder for the code to call other
template pages. One of the few virtues of the current template code
is that it is relatively modular, with more complex templates being
built out of less complex ones. If this programming language is meant
to replace that then it would also need to be able to reference the
results of other template pages. One solution is to pre-expand those
sections (similar to what is done now, I believe), but that can get
rather delicate once one has programming constructs like variable
assignments, looping, and recursion since the template parameters
won't necessarily be fixed at the Preprocessor stage.
-Robert Rohde