On Tue, Jun 30, 2009 at 6:08 PM, Robert Rohderarohde@gmail.com wrote:
In addition to resource limits, any scheme better make sure what's passed into the programming language and what's passed out makes sense. For example, you shouldn't have it generating raw HTML and probably shouldn't let it mess with strip markers. Some of this may be automatic depending how it's integrated into the parser. One would probably also want to limit the size of an allowed output (e.g. don't let it send 5 MB to the user). Depending on the integration there may be other control sequences that one needs to catch when it returns as well.
I was assuming it would just return wikitext, and that would be integrated into the page and parsed, following all limits on wikitext (including size) -- just as with current parser functions.
On a separate point, one of the limitations of stand-alone type sandboxes is that it would make it harder for the code to call other template pages. One of the few virtues of the current template code is that it is relatively modular, with more complex templates being built out of less complex ones. If this programming language is meant to replace that then it would also need to be able to reference the results of other template pages. One solution is to pre-expand those sections (similar to what is done now, I believe), but that can get rather delicate once one has programming constructs like variable assignments, looping, and recursion since the template parameters won't necessarily be fixed at the Preprocessor stage.
I'd assume we'd support some kind of includes. One rudimentary way to do it would be to run Lua stuff after or during preprocessing, so you could just include Lua code macro-style using templates. A better way would probably be to support the include features of the language itself (I don't know how they work offhand, for Lua).
On Tue, Jun 30, 2009 at 6:12 PM, Jared Williamsjared.williams1@ntlworld.com wrote:
Yeah, would also need time & mem use restrictions.
Which is impossible for in-process use. You'd have to shell out if you do that, which defeats the entire point of using PHP instead of something else to begin with.
On Tue, Jun 30, 2009 at 7:16 PM, Andrew Garrettagarrett@wikimedia.org wrote:
That's just scary. We'd definitely want to do the validation as close as possible to the actual eval()ing, to minimise backdoors like Special:Import et al.
You'd be saving the code to a file on disk somewhere, probably named using a hash of the input. The only thing saving the code would be the code that sanitizes it. There's no way anything could go wrong unless an attacker gains filesystem write access, in which case you're hosed anyway. Parsing PHP on every page view when you could cache it in APC is crazy.
On Tue, Jun 30, 2009 at 7:24 PM, Hay (Husky)huskyr@gmail.com wrote:
That leaves us to Lua and Javascript, which are both small and efficient languages meant to solve tasks like this. Remember, i'm talking about 'core' Javascript here, not with all DOM methods and stuff. If you strip that all out (take a look at the 1.5. core reference at Mozilla.com: https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference) you get a pretty nice and simple language that isn't very large. Both would require a new parser and/or installed compilers on the server-side. Compared to the disadvantages of other options, that seems like a pretty small loss for a great win.
Reasonable enough, yeah. Sandboxing might easier too. What are some standalone JavaScript interpreters we could use? Ideally we'd use a heavily-optimized JIT compiler, like V8 or TraceMonkey, but I don't know if those work standalone.
On Tue, Jun 30, 2009 at 8:33 PM, Brion Vibberbrion@wikimedia.org wrote:
That's why we want to fix it! :)
It *should* be fairly trivial to fetch a template/plugin sort of thing off of one wiki and put it on another. Consider this as one of our goals for next-gen templating.
Eh. Then that really ties our hands. If we have to have support for shared hosts without exec() support, then I don't see any viable option except sanitized PHP.
On Tue, Jun 30, 2009 at 8:37 PM, Brion Vibberbrion@wikimedia.org wrote:
Executing PHP from apache-writable files saved on disk is also a security danger.
The original implementation of the MonoBook skin used the TAL templating language, which was compiled into executable PHP at runtime and stored in /tmp so it could be cached for the next view.
In addition to difficulties with hosts which had misconfigured /tmp directories, we found that people sharing their hosts with poorly-secured WordPress installations would end up finding their wikis hacked -- worms exploiting vulnerabilities in other PHP apps would hop around the system modifying any .php files they could write to... including the cached PHPTAL templates.
It could be eval()ed by default, but the performance wins from using APC would surely be huge. If you set it up carefully it should be safe enough.
On Tue, Jun 30, 2009 at 8:41 PM, BrianBrian.Mingus@colorado.edu wrote:
There is nothing in the OP that indicates that we are keeping the current template code or even that it would be desirable. Whatever facilities the language we choose has for including other files and passing arguments to functions is 100% sufficient.
We're talking about changing how templates are written, not how they're called. Changing the template call syntax is an entirely different discussion that's orthogonal to this one.
On Tue, Jun 30, 2009 at 9:02 PM, Trevor Parscaltparscal@wikimedia.org wrote:
Seems like JSON syntax is pretty simple and could be a big improvement to how templates are currently invoked.
I'm not sure where you'd use JSON here?