On templates and programming languages

List overview All Threads
Download

newer

older

Making local mirror of one of the...

Defining a configuration for...

Brion Vibber

30 Jun 2009 30 Jun '09

9:46 p.m.

As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.

An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.

There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid):

* PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

* JavaScript

Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.

Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P

* Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

-- brion

Show replies by date

Chad

30 Jun 30 Jun

9:57 p.m.

On Tue, Jun 30, 2009 at 12:16 PM, Brion Vibberbrion@wikimedia.org wrote:

...

As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.

An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.

There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid):

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

JavaScript

Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.

Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

-- brion

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

I haven't tried it, but there seems to be a Lua Pecl extension.

-Chad

Robert Rohde

10:10 p.m.

On Tue, Jun 30, 2009 at 9:27 AM, Chadinnocentkiller@gmail.com wrote:

...

I haven't tried it, but there seems to be a Lua Pecl extension.

The Lua Pecl says:

"We should emphasize that is still under development and is completely experimental."

That was nearly two years ago and there doesn't appear to have been any real work on it since. Someone would probably need to look at it carefully to make sure it is adequately functional before considering that path.

-Robert Rohde

Brion Vibber

10:21 p.m.

Robert Rohde wrote:

...

On Tue, Jun 30, 2009 at 9:27 AM, Chadinnocentkiller@gmail.com wrote:

...
I haven't tried it, but there seems to be a Lua Pecl extension.

The Lua Pecl says:

"We should emphasize that is still under development and is completely experimental."

That was nearly two years ago and there doesn't appear to have been any real work on it since. Someone would probably need to look at it carefully to make sure it is adequately functional before considering that path.

A PECL extension wouldn't be a compatibility improvement over shelling out to a Lua binary; it still requires compilation and installation on the server. (Though it could be a performance win by having the Lua interpreter available in-process.)

-- brion

Amir E. Aharoni

10:04 p.m.

On Tue, Jun 30, 2009 at 19:16, Brion Vibberbrion@wikimedia.org wrote:

...

As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

I never thought that Perl is bad. There are some irresponsible Perl programmers, just as there are some irresponsible PHP and Python programmers.

You could try add Perl into your questionnaire.

Advantage: At least as portable as Python. Someone who can have PHP, can have Perl, too.

Advantage: Anyone who knows PHP, knows Perl almost completely.

Not sure whether it's an advantage: Perl has some built-in security features (taint), but i'm not really a security expert.

-- Amir Elisha Aharoni http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore

Victor Vasiliev

10:17 p.m.

Brion Vibber wrote:

...

As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.

An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.

There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid):

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

JavaScript

Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.

Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

-- brion

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

I'm working on rewriting abuse filter parser so it's suitable for embedding in wikitext. It's half-done and will be ready soon. --vvvv

Brion Vibber

10:20 p.m.

Victor Vasiliev wrote:

...

I'm working on rewriting abuse filter parser so it's suitable for embedding in wikitext. It's half-done and will be ready soon.

Eh, I'd rather replace the AbuseFilter scripting with JS/Lua/Python/whatever too. :)

-- brion

Victor Vasiliev

10:43 p.m.

Brion Vibber wrote:

...

Victor Vasiliev wrote:

...
I'm working on rewriting abuse filter parser so it's suitable for embedding in wikitext. It's half-done and will be ready soon.

Eh, I'd rather replace the AbuseFilter scripting with JS/Lua/Python/whatever too. :)

-- brion

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

We'll anyway have to remove for() and while() from it and restrict it in other ways. --vvv

Robert Rohde

10:20 p.m.

On Tue, Jun 30, 2009 at 9:16 AM, Brion Vibberbrion@wikimedia.org wrote: <snip>

...

There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid):

<snip>

In the Lua Bugzilla thread (#19298), there was some extended discussion about using the AbuseFilter parser as the basis for a Mediawiki scripting language. From your comment should I assume we are taking that option off the table?

There are advantages to that approach in terms of integration and flexibility, though rolling our own scripting language would obviously be a quite complex (and probably long-term) undertaking.

-Robert Rohde

Brion Vibber

10:23 p.m.

Robert Rohde wrote:

...

In the Lua Bugzilla thread (#19298), there was some extended discussion about using the AbuseFilter parser as the basis for a Mediawiki scripting language. From your comment should I assume we are taking that option off the table?

There are advantages to that approach in terms of integration and flexibility, though rolling our own scripting language would obviously be a quite complex (and probably long-term) undertaking.

Right, that's exactly what I don't want to have to do.

I'd honestly rather implement a JS interpreter in PHP than create and maintain our own programming language, if it came to that. :)

-- brion

Trevor Parscal

10:33 p.m.

On 6/30/09 9:16 AM, Brion Vibber wrote:

...

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

-- brion

GPL, Alpha software, seems to be abandoned in 2005 http://j4p5.sourceforge.net/

Perhaps this could be tested, considered, brought back to life, etc?

- Trevor

Robert Rohde

10:39 p.m.

On Tue, Jun 30, 2009 at 10:03 AM, Trevor Parscaltparscal@wikimedia.org wrote:

...

On 6/30/09 9:16 AM, Brion Vibber wrote:

...
Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

-- brion

GPL, Alpha software, seems to be abandoned in 2005 http://j4p5.sourceforge.net/

Perhaps this could be tested, considered, brought back to life, etc?

Google also turns up http://phpjs.berlios.de/

Like J4P5 it also seems to be an abandoned alpha.

For a stand-alone JavaScript interpreters there are some well-supported projects, like Jaxer: http://www.aptana.com/jaxer

However, this again gets back to separately compiled code, and would not easily be able to interact with PHP.

-Robert Rohde

Brian

10:45 p.m.

So far my favorite idea is to use a restricted subset of PHP.

I would like to broach an important topic however: How can we convert all of the existing ParserFunctions and difficult-to-read template code to this new language automatically? Are we really talking about the dream of getting rid of templates entirely? The end of {{||||||}} ?

How difficult would it be to modify the parser to spit out some of its data structures in the new language as opposed to HTML etc.. ?

This seems to be the more difficult part of the project.

There is a more practical/pragmatic approach which is to deprecate the current syntax similar to the way languages sometimes deprecate language features. I fear that the conversion is a superhuman task, however.

On Tue, Jun 30, 2009 at 10:16 AM, Brion Vibber brion@wikimedia.org wrote:

...

As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.

An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.

There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid):

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

JavaScript

Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.

Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

-- brion

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Robert Rohde

10:50 p.m.

On Tue, Jun 30, 2009 at 10:15 AM, BrianBrian.Mingus@colorado.edu wrote:

...

So far my favorite idea is to use a restricted subset of PHP.

I would like to broach an important topic however: How can we convert all of the existing ParserFunctions and difficult-to-read template code to this new language automatically? Are we really talking about the dream of getting rid of templates entirely? The end of {{||||||}} ?

How difficult would it be to modify the parser to spit out some of its data structures in the new language as opposed to HTML etc.. ?

This seems to be the more difficult part of the project.

There is a more practical/pragmatic approach which is to deprecate the current syntax similar to the way languages sometimes deprecate language features. I fear that the conversion is a superhuman task, however.

You couldn't ever turn template syntax off without making old revisions unrenderable. The best one could likely do is encourage people to upgrade and provide tools to make that easier. However, given the nastiness of template syntax, I would expect no end of wiki authors willing to help convert the commonly used stuff.

-Robert Rohde

Brian

10:54 p.m.

On Tue, Jun 30, 2009 at 11:20 AM, Robert Rohde rarohde@gmail.com wrote:

...

You couldn't ever turn template syntax off without making old revisions unrenderable. The best one could likely do is encourage people to upgrade and provide tools to make that easier. However, given the nastiness of template syntax, I would expect no end of wiki authors willing to help convert the commonly used stuff.

-Robert Rohde

The solution (no doubt first developed on this list many years ago) is to have mark revisions that still trigger the template/parser functions parser as using that syntax, and to mark revisions that have moved on as using the new language.

Moreover, old revisions already are unrenderable. They may look like they render correctly but in fact they don't. This is because mediawiki has no notion of the fact that a particular revision of an article also uses particular revisions of templates etc...

At any rate, I don't see how this nitpick is a difficult problem technically.

Brian

1 Jul 1 Jul

4:31 a.m.

On Tue, Jun 30, 2009 at 11:20 AM, Robert Rohde rarohde@gmail.com wrote:

...

However, given the nastiness of template syntax, I would expect no end of wiki authors willing to help convert the commonly used stuff.

-Robert Rohde

I was curious just how terrible of a task conversion can be expected to be. This is just a heuristic I came up with..

# Simple English parser functions $ bunzip2 -c simplewiki-20090623-pages-articles.xml.bz2 | grep -o '{{#' | wc -l 22,211

# Simple English templates $ bunzip2 -c simplewiki-20090623-pages-articles.xml.bz2 | grep -o '{{' | wc -l 416,126 - 22,211 = 393,915

# English parser functions $ bunzip2 -c enwiki-20090618-pages-articles.xml.bz2 | grep -o '{{#' | wc -l 430,980

# English templates $ bunzip2 -c enwiki-20090618-pages-articles.xml.bz2 | grep -o '{{' | wc -l 44,928,358 - 430,980 = 44,497,378

Robert Rohde

5:39 a.m.

On Tue, Jun 30, 2009 at 4:01 PM, BrianBrian.Mingus@colorado.edu wrote:

...

On Tue, Jun 30, 2009 at 11:20 AM, Robert Rohde rarohde@gmail.com wrote:

...
However, given the nastiness of template syntax, I would expect no end of wiki authors willing to help convert the commonly used stuff.

-Robert Rohde

I was curious just how terrible of a task conversion can be expected to be. This is just a heuristic I came up with..

# Simple English parser functions $ bunzip2 -c simplewiki-20090623-pages-articles.xml.bz2 | grep -o '{{#' | wc -l 22,211

# Simple English templates $ bunzip2 -c simplewiki-20090623-pages-articles.xml.bz2 | grep -o '{{' | wc -l 416,126 - 22,211 = 393,915

# English parser functions $ bunzip2 -c enwiki-20090618-pages-articles.xml.bz2 | grep -o '{{#' | wc -l 430,980

# English templates $ bunzip2 -c enwiki-20090618-pages-articles.xml.bz2 | grep -o '{{' | wc -l 44,928,358 - 430,980 = 44,497,378

I assume we are primarily talking about replacing template code and not template calls, per se.

In other words, I assume things like "{{fact}}" and "{{msg | foo is bar }}" will be be basically unchanged on the article side but rewritten on the implementation side in Template: space. If that is correct, it would be more useful to simply ask how large Template: space is rather than counting all the template calls.

-Robert Rohde

Brian

5:58 a.m.

On Tue, Jun 30, 2009 at 6:09 PM, Robert Rohderarohde@gmail.com wrote:

...

In other words, I assume things like "{{fact}}" and "{{msg | foo is bar }}" will be be basically unchanged on the article side but rewritten on the implementation side in Template: space. If that is correct, it would be more useful to simply ask how large Template: space is rather than counting all the template calls.

-Robert Rohde

Mixing the new language with existing wikicode? With a new language I would like to see the old language go out the door. The end of double braces.

Thomas Dalton

6:04 a.m.

2009/7/1 Brian Brian.Mingus@colorado.edu:

...

On Tue, Jun 30, 2009 at 6:09 PM, Robert Rohderarohde@gmail.com wrote:

...
In other words, I assume things like "{{fact}}" and "{{msg | foo is bar }}" will be be basically unchanged on the article side but rewritten on the implementation side in Template: space. If that is correct, it would be more useful to simply ask how large Template: space is rather than counting all the template calls.

-Robert Rohde

Mixing the new language with existing wikicode? With a new language I would like to see the old language go out the door. The end of double braces.

What would you replace them with? The wikitext used by regular editors should be as simple as possible, we don't want to require PHP or Javascript to be used by anyone wanting to add an infobox to an article.

Brian

6:11 a.m.

On Tue, Jun 30, 2009 at 6:34 PM, Thomas Daltonthomas.dalton@gmail.com wrote:

...

What would you replace them with? The wikitext used by regular editors should be as simple as possible, we don't want to require PHP or Javascript to be used by anyone wanting to add an infobox to an article.

There is nothing in the OP that indicates that we are keeping the current template code or even that it would be desirable. Whatever facilities the language we choose has for including other files and passing arguments to functions is 100% sufficient.

Thomas Dalton

6:13 a.m.

2009/7/1 Brian Brian.Mingus@colorado.edu:

...

On Tue, Jun 30, 2009 at 6:34 PM, Thomas Daltonthomas.dalton@gmail.com wrote:

...
What would you replace them with? The wikitext used by regular editors should be as simple as possible, we don't want to require PHP or Javascript to be used by anyone wanting to add an infobox to an article.

There is nothing in the OP that indicates that we are keeping the current template code or even that it would be desirable. Whatever facilities the language we choose has for including other files and passing arguments to functions is 100% sufficient.

There is no proposal to replace wikitext with PHP (it wouldn't even work, PHP isn't a markup language, ditto Javascript, Python, etc.), the proposal is to replace the template code, ie. the code on the template pages.

Brian

6:26 a.m.

On Tue, Jun 30, 2009 at 6:43 PM, Thomas Daltonthomas.dalton@gmail.com wrote:

...

There is no proposal to replace wikitext with PHP (it wouldn't even work, PHP isn't a markup language, ditto Javascript, Python, etc.), the proposal is to replace the template code, ie. the code on the template pages.

The OP does not say it is a recommendation to replace ParserFunctions, it says, "our current templating system." In my mind that absolutely includes the use of templates in the article namespace.

There are lots of usability improvements that can be made to the templating system. First and foremost the new system should allow advanced wiki users to perform programmatic operations on article data without the requirement that the data in the article be made unreadable.

If we only focus our efforts on making the template namespace more complicated by giving it a more advanced programming language and we leave the article namespace as it is then we have not even touched the usability issue. We have just made it worse.

I do of course have some specific ideas about how to achieve this goal, but I'm kind of in "shock and awe" that it's not seen as the main reason for improving the template system!

Brion Vibber

2 Jul 2 Jul

10:38 p.m.

Brian wrote:

...

There are lots of usability improvements that can be made to the templating system. First and foremost the new system should allow advanced wiki users to perform programmatic operations on article data without the requirement that the data in the article be made unreadable.

If we only focus our efforts on making the template namespace more complicated by giving it a more advanced programming language and we leave the article namespace as it is then we have not even touched the usability issue. We have just made it worse.

These are totally orthogonal issues, and paying attention to one doesn't mean ignoring the other.

The ideal markup situation for the article namespace is that markup shouldn't even *be* exposed to most users. A long-term goal is migration to a more WYSIWIG-like editing experience -- to which one of the potential stumbling blocks has been "but how will we do templates, which currently are built with our horrifying wiki markup?"

Most editors will never know or care about the internal implementation of templates, just as they don't know or care about it today. Cleaning them up to allow the power-users who *write* templates to make them functional and useful *and* maintainable is a win for template writers, while having no direct impact on general editors.

(Indirectly, it will mean they're provided with better tools to use in their articles.)

<not the subject of this thread> For the general article editing experience, the issues are very different, and that's the area the Wikipedia Usability Initiative is concentrating on.

In the very short term, we're working on general look & feel, workflow, and making it easier to figure out what you're supposed to do (such as making the markup cheat-sheet available without leaving the editing window).

In the medium term, we hope to be able to "fold up" things that are particularly ugly in markup such as images/media, template invocations and tables, and provide friendlier widgets for adding and editing them.

In the long term, we might hope to be able to drop the front-end markup entirely... but that's still a harder problem with several possible trade-offs. </not the subject of this thread>

-- brion

Michael Daly

3 Jul 3 Jul

2:09 a.m.

Brion Vibber wrote:

...

The ideal markup situation for the article namespace is that markup shouldn't even *be* exposed to most users. A long-term goal is migration to a more WYSIWIG-like editing experience -- to which one of the potential stumbling blocks has been "but how will we do templates, which currently are built with our horrifying wiki markup?"

Since templates forbid looping, whatever manages it can't be considered a programming language (missing iteration in {sequence, selection, iteration}).

Perhaps we should consider this a markup problem and not a programming problem. If templates have styles (not to be confused with CSS concepts necessarily) then we just describe the template instead of program the template. {{{var}}} can become something like content: in CSS. Conditionals are... interesting. Apply style x if the condition satisfied else apply style y (.e.g display: "nicely"; vs display: none;)

Just an idea...

Mike

Gregory Maxwell

3:04 a.m.

On Thu, Jul 2, 2009 at 4:39 PM, Michael Dalymichael.daly@kayakwiki.org wrote:

...

Brion Vibber wrote:

...
The ideal markup situation for the article namespace is that markup shouldn't even *be* exposed to most users. A long-term goal is migration to a more WYSIWIG-like editing experience -- to which one of the potential stumbling blocks has been "but how will we do templates, which currently are built with our horrifying wiki markup?"

Since templates forbid looping, whatever manages it can't be considered a programming language (missing iteration in {sequence, selection, iteration}).

You can clone a template multiple times to effectively create recursion of a finite maximum depth.

Brion Vibber

3:08 a.m.

Michael Daly wrote:

...

Brion Vibber wrote:

...
The ideal markup situation for the article namespace is that markup shouldn't even *be* exposed to most users. A long-term goal is migration to a more WYSIWIG-like editing experience -- to which one of the potential stumbling blocks has been "but how will we do templates, which currently are built with our horrifying wiki markup?"

Since templates forbid looping,

Since iteration over a set is frequently desired/needed, assume it will exist in a sensible programming language.

As already noted in this thread, horrible hacks for limited-depth looping are already in use.

-- brion

Steve Bennett

7:48 a.m.

On Fri, Jul 3, 2009 at 7:38 AM, Brion Vibberbrion@wikimedia.org wrote:

...

Since iteration over a set is frequently desired/needed, assume it will exist in a sensible programming language.

As already noted in this thread, horrible hacks for limited-depth looping are already in use.

So: 1) The chosen language will support iteration over finite sets 2) Could it support general iteration, recursion etc? 3) If so, are there any good mechanisms for limiting the destrutiveness of an infinite loop?

That is, is it practical to say "you can iterate all you like, but you're only getting 10ms to do it"? Sounds like it could be an interesting property of a template, where a suitably authorised person could allow certain templates longer execution times.

Steve

Aryeh Gregor

7:52 a.m.

On Thu, Jul 2, 2009 at 10:18 PM, Steve Bennettstevagewp@gmail.com wrote:

...

So:

The chosen language will support iteration over finite sets

Could it support general iteration, recursion etc?

If so, are there any good mechanisms for limiting the

destrutiveness of an infinite loop?

You don't really need an infinite loop. DoS would work fine if you can have any loop. Even with just foreach:

foreach(array(1,2)as $x1)foreach(array(1,2)as $x2)....

A few dozen of those in a row will give you a nice short bit of code that may as well run forever.

Marco Schuster

7:55 a.m.

On Fri, Jul 3, 2009 at 4:22 AM, Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com

...

wrote:

...

On Thu, Jul 2, 2009 at 10:18 PM, Steve Bennettstevagewp@gmail.com wrote:

...
So:

The chosen language will support iteration over finite sets

Could it support general iteration, recursion etc?

If so, are there any good mechanisms for limiting the

destrutiveness of an infinite loop?

You don't really need an infinite loop. DoS would work fine if you can have any loop. Even with just foreach:

foreach(array(1,2)as $x1)foreach(array(1,2)as $x2)....

A few dozen of those in a row will give you a nice short bit of code that may as well run forever.

You can make some kind of counter, which gets incremented each foreach/while/for loop. If it reaches 200 (or whatever), execution is stopped.

Marco

-- VMSoft GbR Nabburger Str. 15 81737 München Geschäftsführer: Marco Schuster, Volker Hemmert http://vmsoft-gbr.de

Robert Rohde

8 a.m.

On Thu, Jul 2, 2009 at 7:25 PM, Marco Schustermarco@harddisk.is-a-geek.org wrote:

...

On Fri, Jul 3, 2009 at 4:22 AM, Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com

...
wrote:

...
On Thu, Jul 2, 2009 at 10:18 PM, Steve Bennettstevagewp@gmail.com wrote:

...
So:

The chosen language will support iteration over finite sets

Could it support general iteration, recursion etc?

If so, are there any good mechanisms for limiting the

destrutiveness of an infinite loop?

You don't really need an infinite loop. DoS would work fine if you can have any loop. Even with just foreach:

foreach(array(1,2)as $x1)foreach(array(1,2)as $x2)....

A few dozen of those in a row will give you a nice short bit of code that may as well run forever.

You can make some kind of counter, which gets incremented each foreach/while/for loop. If it reaches 200 (or whatever), execution is stopped.

Really, the ideal solution is to say the user is allowed X number of basic operations, Y amount of memory, and Z amount of execution time, and write an interpreter that is agnostic about how those resources are used. If all you do is add limits to loops, then someone will add loops of loops and or even flat stacks to get around it.

-Robert Rohde

Steve Bennett

8:04 a.m.

On Fri, Jul 3, 2009 at 12:25 PM, Marco Schustermarco@harddisk.is-a-geek.org wrote:

...

You can make some kind of counter, which gets incremented each foreach/while/for loop. If it reaches 200 (or whatever), execution is stopped.

Yes, but that implies: 1) We're writing an interpreter, or getting heavily involved in the codebase of an existing one 2) Thinking ahead of every possible DoS and thwarting it.

I was wondering if there was a more general solution using a black box interpreter. But without knowing the language or interpreter, that may not be a very meaningful question.

Steve

Aryeh Gregor

8:12 a.m.

On Thu, Jul 2, 2009 at 10:25 PM, Marco Schustermarco@harddisk.is-a-geek.org wrote:

...

You can make some kind of counter, which gets incremented each foreach/while/for loop. If it reaches 200 (or whatever), execution is stopped.

Sure -- if you're writing the program language interpreter yourself. I think we were hoping to avoid that.

Tei

3:19 p.m.

On Fri, Jul 3, 2009 at 4:18 AM, Steve Bennett stevagewp@gmail.com wrote:

...

On Fri, Jul 3, 2009 at 7:38 AM, Brion Vibberbrion@wikimedia.org wrote:

...
Since iteration over a set is frequently desired/needed, assume it will exist in a sensible programming language.

As already noted in this thread, horrible hacks for limited-depth looping are already in use.

So:

The chosen language will support iteration over finite sets

Could it support general iteration, recursion etc?

If so, are there any good mechanisms for limiting the

destrutiveness of an infinite loop?

That is, is it practical to say "you can iterate all you like, but you're only getting 10ms to do it"? Sounds like it could be an interesting property of a template, where a suitably authorised person could allow certain templates longer execution times.

another option, is to use a compiled language to a intermediate languaje that is interpreted. make so the interpreted for a program has a number of instruction limit. Say.. .a budget of 90.000 opcodes. If a script break that barrier, is stoped (the interpreter "return;") and the script is marked as "dirty".

bad example follows: QuakeC is compiled to QC (a fake aseembler lang) this is interpreted by QCVM (the quake virtual machine). The interpreter include some limitations (on stock QCVM, the deep of recursion ).

A good side effect of this, is that a Quake mod work on any OS.

Trivia: Quake3 and others have a setup like this one, but using C. It probably is not useable for Wikipedia, since C is bad lang to work with strings. Too bad, because is fast, crossplatform, there are lots of tools to work with it, and existing programmers.

-- -- ℱin del ℳensaje.

Trevor Parscal

1 Jul 1 Jul

6:32 a.m.

Seems like JSON syntax is pretty simple and could be a big improvement to how templates are currently invoked.

Bottom line, a well defined syntax like JavaScript is going to be more user friendly than a syntax which is only defined by the behavior of a parser with standardization at all.

- Trevor

Sent from my iPod

On Jun 30, 2009, at 5:34 PM, Thomas Dalton thomas.dalton@gmail.com wrote:

...

2009/7/1 Brian Brian.Mingus@colorado.edu:

...
On Tue, Jun 30, 2009 at 6:09 PM, Robert Rohderarohde@gmail.com wrote:

...
In other words, I assume things like "{{fact}}" and "{{msg | foo is bar }}" will be be basically unchanged on the article side but rewritten on the implementation side in Template: space. If that is correct, it would be more useful to simply ask how large Template: space is rather than counting all the template calls.

-Robert Rohde

Mixing the new language with existing wikicode? With a new language I would like to see the old language go out the door. The end of double braces.

What would you replace them with? The wikitext used by regular editors should be as simple as possible, we don't want to require PHP or Javascript to be used by anyone wanting to add an infobox to an article.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

randomcoder1

6:59 a.m.

http://jtemplates.tpython.com/ ? :)

Trevor Parscal wrote:

...

Seems like JSON syntax is pretty simple and could be a big improvement to how templates are currently invoked.

Bottom line, a well defined syntax like JavaScript is going to be more user friendly than a syntax which is only defined by the behavior of a parser with standardization at all.

Trevor

Sent from my iPod

On Jun 30, 2009, at 5:34 PM, Thomas Dalton thomas.dalton@gmail.com wrote:

...
2009/7/1 Brian Brian.Mingus@colorado.edu:

...
On Tue, Jun 30, 2009 at 6:09 PM, Robert Rohderarohde@gmail.com wrote:

...
In other words, I assume things like "{{fact}}" and "{{msg | foo is bar }}" will be be basically unchanged on the article side but rewritten on the implementation side in Template: space. If that is correct, it would be more useful to simply ask how large Template: space is rather than counting all the template calls.

-Robert Rohde

Mixing the new language with existing wikicode? With a new language I would like to see the old language go out the door. The end of double braces.

What would you replace them with? The wikitext used by regular editors should be as simple as possible, we don't want to require PHP or Javascript to be used by anyone wanting to add an infobox to an article.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Chad

6:04 a.m.

On Tue, Jun 30, 2009 at 8:28 PM, BrianBrian.Mingus@colorado.edu wrote:

...

On Tue, Jun 30, 2009 at 6:09 PM, Robert Rohderarohde@gmail.com wrote:

...
In other words, I assume things like "{{fact}}" and "{{msg | foo is bar }}" will be be basically unchanged on the article side but rewritten on the implementation side in Template: space. If that is correct, it would be more useful to simply ask how large Template: space is rather than counting all the template calls.

-Robert Rohde

Mixing the new language with existing wikicode? With a new language I would like to see the old language go out the door. The end of double braces.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Unless we plan on trying to mass-convert not only years of old revisions but change years-old behavior that millions of users have come to expect? I would expect _any_ change to keep {{sometemplate}} always working, even if the mechanics behind it change.

-Chad

Michael Daly

7:57 a.m.

Chad wrote:

...

Unless we plan on trying to mass-convert not only years of old revisions but change years-old behavior that millions of users have come to expect? I would expect _any_ change to keep {{sometemplate}} always working, even if the mechanics behind it change.

Why not switch the template syntax for articles to match the syntax for tags (which in turn is based on XML or whatever syntax that comes from ultimately)?

becomes

or:

That means that the tag namespace and the Template namespace (where namespace is more generic than just the concept of MW namespaces) will potentially clash. This could be handled with something like:

or:

which is a tad more verbose but more explicit.

Mike

Robert Rohde

8:15 a.m.

On Tue, Jun 30, 2009 at 7:27 PM, Michael Dalymichael.daly@kayakwiki.org wrote:

...

Chad wrote:

...
Unless we plan on trying to mass-convert not only years of old revisions but change years-old behavior that millions of users have come to expect? I would expect _any_ change to keep {{sometemplate}} always working, even if the mechanics behind it change.

Why not switch the template syntax for articles to match the syntax for tags (which in turn is based on XML or whatever syntax that comes from ultimately)?

{{sometemplate|var1=foo|var2=bar}}

becomes

<sometemplate>var1=foo; var2=bar;</sometamplate>

or:

<sometemplate var1="foo" var2="bar"/>

That means that the tag namespace and the Template namespace (where namespace is more generic than just the concept of MW namespaces) will potentially clash. This could be handled with something like:

<template name="sometemplate"> var1=foo; var2=bar; </sometamplate>

or:

<template name="sometemplate" var1="foo" var2="bar"/>

which is a tad more verbose but more explicit.

Makes it awfully ugly to pass the result of one template to another template if your syntax is:

-Robert Rohde

Michael Daly

9:23 a.m.

Robert Rohde wrote:

...

Makes it awfully ugly to pass the result of one template to another template if your syntax is:

<template name="sometemplate" var1="<template name="birthday" val="May 24" />" var2="bar"/>

Eww! - hadn't thought of that one. Back to the other style:

or unnamed:

Recursive template processing should be default. Obviously, this is a work in progress...

Mike

Thomas Dalton

9:48 a.m.

2009/7/1 Michael Daly michael.daly@kayakwiki.org:

...

Why not switch the template syntax for articles to match the syntax for tags (which in turn is based on XML or whatever syntax that comes from ultimately)?

What is wrong with the current syntax for calling templates? At least, what is wrong with it that would be improved by that change?

Brion Vibber

2 Jul 2 Jul

10:41 p.m.

Michael Daly wrote:

...

Chad wrote:

...
Unless we plan on trying to mass-convert not only years of old revisions but change years-old behavior that millions of users have come to expect? I would expect _any_ change to keep {{sometemplate}} always working, even if the mechanics behind it change.

Why not switch the template syntax for articles to match the syntax for tags (which in turn is based on XML or whatever syntax that comes from ultimately)?

For the meantime, assume there will be no changes whatsoever in how markup in article space is written. A hypothetical change to template invocation syntax is unrelated to how templates are implemented, and clouds the current discussion.

-- brion

Bryan Tong Minh

30 Jun 30 Jun

10:52 p.m.

On Tue, Jun 30, 2009 at 6:16 PM, Brion Vibberbrion@wikimedia.org wrote:

...

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Also for Python you really will want an editor that supports indenting. Web browsers are not suitable for programming Python.

Bryan

Robert Rohde

11:12 p.m.

On Tue, Jun 30, 2009 at 10:22 AM, Bryan Tong Minhbryan.tongminh@gmail.com wrote:

...

On Tue, Jun 30, 2009 at 6:16 PM, Brion Vibberbrion@wikimedia.org wrote:

...

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Also for Python you really will want an editor that supports indenting. Web browsers are not suitable for programming Python.

Though indenting is mandatory for Python, the use or reasonable indenting is pretty much necessary to produce readable code in any language.

That said, I don't see any reason we couldn't use two or three consecutive spaces to indicate indentations.

-Robert Rohde

Amir E. Aharoni

11:15 p.m.

On Tue, Jun 30, 2009 at 20:42, Robert Rohderarohde@gmail.com wrote:

...

On Tue, Jun 30, 2009 at 10:22 AM, Bryan Tong

...
Also for Python you really will want an editor that supports indenting. Web browsers are not suitable for programming Python.

Though indenting is mandatory for Python, the use or reasonable indenting is pretty much necessary to produce readable code in any language.

That said, I don't see any reason we couldn't use two or three consecutive spaces to indicate indentations.

Four!

-- Amir Elisha Aharoni http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore

Robert Rohde

11:27 p.m.

On Tue, Jun 30, 2009 at 10:45 AM, Amir E. Aharoniamir.aharoni@gmail.com wrote:

...

On Tue, Jun 30, 2009 at 20:42, Robert Rohderarohde@gmail.com wrote:

...
On Tue, Jun 30, 2009 at 10:22 AM, Bryan Tong

...
Also for Python you really will want an editor that supports indenting. Web browsers are not suitable for programming Python.

Though indenting is mandatory for Python, the use or reasonable indenting is pretty much necessary to produce readable code in any language.

That said, I don't see any reason we couldn't use two or three consecutive spaces to indicate indentations.

Four!

Four is the default size of a tab in Python, but indents aren't actually required to be that size. If we are required to type consecutive spaces to format code, I'd actually prefer a smaller default size.

-Robert Rohde

Steve Sanbeg

11:30 p.m.

On Tue, 30 Jun 2009 09:16:41 -0700, Brion Vibber wrote:

...

As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.

An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.

There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid):

I was thinking about something similar this weekend, although I'd thought about different languages:

1 - XSLT

Since the syntax is XML (like the extensions tags) and XPath (vaguely similar to template syntax, although it's XML that calls XPath, the opposite of what we have) It would be reasonably consistent with current syntax. It also should also already be fairly well locked down, and the interface seems fairly clear - present template parameters as stylesheet parameters, and other magic words as an input document. We may just need a few simplifications to make it easier to use.

2- lisp/scheme

Should be easy to write a parser for if needed, since the grammer is so simple, and it should be relatively simple to lock down or extend as needed.

Of course, those are both a bit more esoteric than your recommendations. Perl is nice for getting useful results from short code, if we're not bothered by one parser with no grammer specification calling another one. Tcl may be a reasonable compromise; a less esoteric, imperative language which is often used as an extension language.

Dmitriy Sintsov

1 Jul 1 Jul

11:12 a.m.

...

1 - XSLT

Since the syntax is XML (like the extensions tags) and XPath

(vaguely

...

similar to template syntax, although it's XML that calls XPath, the opposite of what we have) It would be reasonably consistent with current syntax. It also should also already be fairly well locked down, and the interface seems fairly clear - present template parameters as stylesheet parameters, and other magic words as an input document. We may just need a few simplifications to make it easier to use.

XSLT itself is a way too much locked down - even simple things like substrings manipulation and loops aren't so easy to perform. Well, maybe I am too stupid for XSLT but from my experience bringing tag syntax in programming language make the code poorly readable and bloated. I've used XSLT for just one of my projects.

...

2- lisp/scheme

Should be easy to write a parser for if needed, since the grammer is

...

simple, and it should be relatively simple to lock down or extend as needed.

Deeply nested braces of lisp remind me of current MediaWiki parser.

...

Of course, those are both a bit more esoteric than your

recommendations.

...

Perl is nice for getting useful results from short code, if we're not bothered by one parser with no grammer specification calling another one. Tcl may be a reasonable compromise; a less esoteric, imperative language which is often used as an extension language.

Lua was highly valued here at computer lab, also Ocaml (not sure of proper spelling). Dmitriy

Gregory Maxwell

11:47 a.m.

On Wed, Jul 1, 2009 at 1:42 AM, Dmitriy Sintsovquestpc@rambler.ru wrote:

...

XSLT itself is a way too much locked down - even simple things like substrings manipulation and loops aren't so easy to perform. Well, maybe I am too stupid for XSLT but from my experience bringing tag syntax in programming language make the code poorly readable and bloated. I've used XSLT for just one of my projects.

Juniper Networks (my day job) uses XSLT as the primary scripting language on their routing devices, and chose to do so primarily because of sandboxing and the ease of XML tree manipulation with xpath (JunOS configuration has a complete and comprehensive XML representation). To facilitate that usage we defined an alternative syntax for XSLT called SLAX (http://code.google.com/p/libslax/), though it hasn't seen widespread adoption outside of Juniper yet. (Slax can be mechanically converted to XSLT and vice versa)

SLAX pretty much resolves your readability concern. Although there are the conceptual barriers for people coming from procedural languages to any strongly functional programming language still remain.

You don't loop in XSLT, you recurse or iterate over a structure (i.e. map/reduce).

I've grown rather fond of XSLT but wouldn't personally recommend it for this application. It lacks the high speed bytecoded execution environments available for other languages, snf I don't see many scripts on the site doing extensive document tree manipulation (it's hard for me to express how awesome xpath is at that)... and I would also guess that there are probably more adept mediawiki template language coders today than there are people who are really fluent in XSLT.

Dmitriy Sintsov

12:10 p.m.

* Gregory Maxwell gmaxwell@gmail.com [Wed, 1 Jul 2009 02:17:24 -0400]:

...

Juniper Networks (my day job) uses XSLT as the primary scripting language on their routing devices, and chose to do so primarily because of sandboxing and the ease of XML tree manipulation with xpath (JunOS configuration has a complete and comprehensive XML representation). To facilitate that usage we defined an alternative syntax for XSLT called SLAX (http://code.google.com/p/libslax/), though it hasn't seen widespread adoption outside of Juniper yet. (Slax can be mechanically converted to XSLT and vice versa)

SLAX pretty much resolves your readability concern. Although there are the conceptual barriers for people coming from procedural languages to any strongly functional programming language still remain.

Try submitting it as standard? It probably should make XSLT more popular.

...

You don't loop in XSLT, you recurse or iterate over a structure (i.e. map/reduce).

Yes, I've realised that. I've done enough of recursion (you can also program in functional style using procedural languages), but the problem is, that it enforces the recursion where it's not really required. Anyway that's offtopic.

...

I've grown rather fond of XSLT but wouldn't personally recommend it for this application. It lacks the high speed bytecoded execution environments available for other languages, snf I don't see many scripts on the site doing extensive document tree manipulation (it's hard for me to express how awesome xpath is at that)... and I would also guess that there are probably more adept mediawiki template language coders today than there are people who are really fluent in XSLT.

Ok. Dmitriy

William Allen Simpson

1:20 p.m.

Haven't read the entire thread yet, so hopefully nobody has said this:

Perl, write-once, poor choice for uncontrolled environment.

Lisp, at least the computer science type will know. Haven't used it myself since early '80s.

Lua, don't know whether it's improved in the past few years, but freeciv had serious problems with migrating to 5.1. Personally, I've given up on it, but my 14 y-o nephew seems to like it for various game modification.

Javascript, OMG don't go there.

Everybody seems to be going the python direction lately, but I've only minimal experience with it, so cannot make a recommendation.

I'd worry less about providing extensive functionality (we certainly don't have much now, so anything more would be gravy), but rather ease of integration, scalability, and security.

Gregory Maxwell

2:05 p.m.

On Wed, Jul 1, 2009 at 3:50 AM, William Allen Simpsonwilliam.allen.simpson@gmail.com wrote:

...

Javascript, OMG don't go there.

Don't be so quick to dismiss Javscript. If we were making a scorecard it would likely meet most of the checkboxes:

* Available of reliable battle tested sandboxes (and probably the only option discussed other than x-in-JVM meeting this criteria) * Availability of fast execution engines * Widely known by the existing technical userbase (JS beats the other options hands down here) * Already used by many Mediawiki developers * Doesn't inflate the number of languages used in the operation of the site * Possibility of reuse between server-executed and client-executed (Only JS of the named options meets this criteria) * Can easily write clear and readable code * Modern high level language features (dynamic arrays, hash tables, etc)

There may exist great reasons why another language is a better choice, but JS is far from the first thing that should be eliminated.

Python is a fine language but it fails all the criteria I listed above except the last two.

Hay (Husky)

2:14 p.m.

Javascript might have gotten a bad name in the past because of 14-year olds who used it to display 'Welcome to my website!' alerts on their Geocities homepage, but it's really unfair. Javascript is a very flexible and dynamic language that can be written very elegantly.

I urge everyone who still think Javascript is a toy language to read Douglas Crockford's excellent article:

http://javascript.crockford.com/javascript.html

-- Hay

On Wed, Jul 1, 2009 at 10:35 AM, Gregory Maxwellgmaxwell@gmail.com wrote:

...

On Wed, Jul 1, 2009 at 3:50 AM, William Allen Simpsonwilliam.allen.simpson@gmail.com wrote:

...
Javascript, OMG don't go there.

Don't be so quick to dismiss Javscript. If we were making a scorecard it would likely meet most of the checkboxes:

Available of reliable battle tested sandboxes (and probably the only

option discussed other than x-in-JVM meeting this criteria)

Availability of fast execution engines

Widely known by the existing technical userbase (JS beats the

other options hands down here)

Already used by many Mediawiki developers

Doesn't inflate the number of languages used in the operation of the site

Possibility of reuse between server-executed and client-executed

(Only JS of the named options meets this criteria)

Can easily write clear and readable code

Modern high level language features (dynamic arrays, hash tables, etc)

There may exist great reasons why another language is a better choice, but JS is far from the first thing that should be eliminated.

Python is a fine language but it fails all the criteria I listed above except the last two.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Trevor Parscal

8:33 p.m.

I'm glad to see I'm not alone. JavaScript can indeed invoke bad memories of fragile scripts running in IE5 which are long and awkward due to limitations in browser technology at the time. However, anyone who has used a modern library like jQuery on a support browser will tell you it's very powerful and intuitive while being simple, straightforward and actually fun. Any language capable of supporting this experience is worth seriously considering as an option for us.

- Trevor

Sent from my iPod

On Jul 1, 2009, at 1:44 AM, "Hay (Husky)" huskyr@gmail.com wrote:

...

Javascript might have gotten a bad name in the past because of 14-year olds who used it to display 'Welcome to my website!' alerts on their Geocities homepage, but it's really unfair. Javascript is a very flexible and dynamic language that can be written very elegantly.

I urge everyone who still think Javascript is a toy language to read Douglas Crockford's excellent article:

http://javascript.crockford.com/javascript.html

-- Hay

On Wed, Jul 1, 2009 at 10:35 AM, Gregory Maxwellgmaxwell@gmail.com wrote:

...
On Wed, Jul 1, 2009 at 3:50 AM, William Allen Simpsonwilliam.allen.simpson@gmail.com wrote:

...
Javascript, OMG don't go there.

Don't be so quick to dismiss Javscript. If we were making a scorecard it would likely meet most of the checkboxes:

Available of reliable battle tested sandboxes (and probably the

only option discussed other than x-in-JVM meeting this criteria)

Availability of fast execution engines

Widely known by the existing technical userbase (JS beats the

other options hands down here)

Already used by many Mediawiki developers

Doesn't inflate the number of languages used in the operation of

the site

Possibility of reuse between server-executed and client-executed

(Only JS of the named options meets this criteria)

Can easily write clear and readable code

Modern high level language features (dynamic arrays, hash tables,

etc)

There may exist great reasons why another language is a better choice, but JS is far from the first thing that should be eliminated.

Python is a fine language but it fails all the criteria I listed above except the last two.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Alex

11 p.m.

Trevor Parscal wrote:

...

I'm glad to see I'm not alone. JavaScript can indeed invoke bad memories of fragile scripts running in IE5 which are long and awkward due to limitations in browser technology at the time. However, anyone who has used a modern library like jQuery on a support browser will tell you it's very powerful and intuitive while being simple, straightforward and actually fun. Any language capable of supporting this experience is worth seriously considering as an option for us.

Of course, little in the jQuery library would be useful for making scripts that are executed server-side and output wikitext.

-- Alex (wikipedia:en:User:Mr.Z-man)

William Allen Simpson

8:51 p.m.

Hay (Husky) wrote:

...

Javascript might have gotten a bad name in the past because of 14-year olds who used it to display 'Welcome to my website!' alerts on their Geocities homepage, but it's really unfair. Javascript is a very flexible and dynamic language that can be written very elegantly.

I urge everyone who still think Javascript is a toy language to read Douglas Crockford's excellent article:

http://javascript.crockford.com/javascript.html

Not very convincing.... "There are already too many versions. This creates confusion." "Design Errors" "Lousy Implementations" "Substandard Standard"

"But many opinions of the language are based on its immature forms." Admittedly true for me. Never want to use it in production again.

...

On Wed, Jul 1, 2009 at 10:35 AM, Gregory Maxwellgmaxwell@gmail.com wrote:

...
...

Doesn't inflate the number of languages used in the operation of the site

This is the important checkbox, as far as integration with the project (my first criterion), but is the server side code already running JavaScript? For serving pages?

...

...

Possibility of reuse between server-executed and client-executed

(Only JS of the named options meets this criteria)

I'd actually put this down as a negative. In my experience, for security, clear division between client and server is required. I've participated in too many projects that thought it would be cool, and then spent a good part of my time building firewalls between client and server to eliminate bad assumptions about validity of the other side.

My general rule: coming over the network, presume it's bad data.

Double/quadruple/octuple that for any data that is then executed as a script. In effect, build an interpreter within the interpreter to validate the code before execution of the code. Never fun....

...

...

Can easily write clear and readable code

Not in my experience. And we have far too many examples of existing JS already being used in horrid templates, being promulgated in important areas such as large categories, that don't seem to work consistently, and don't work at all with JavaScript turned off.

I run Firefox with JS off by default for all wikimedia sites, because of serious problems in the not so recent past!

William Allen Simpson

8:56 p.m.

William Allen Simpson wrote:

...

I run Firefox with JS off by default for all wikimedia sites, because of serious problems in the not so recent past!

s/recent/distant/

Daniel Schwen

9:06 p.m.

...

...
I run Firefox with JS off by default for all wikimedia sites, because of serious problems in the not so recent past!

s/recent/distant/

Hooray JavaScript FUD!

Hay (Husky)

9:07 p.m.

On Wed, Jul 1, 2009 at 5:26 PM, William Allen Simpsonwilliam.allen.simpson@gmail.com wrote:

...

William Allen Simpson wrote:

...
I run Firefox with JS off by default for all wikimedia sites, because of serious problems in the not so recent past!

s/recent/distant/

I'm sorry that you seem to have such bad experiences with JavaScript. Still, i don't think your comments are really valid in today's world. Take a look at 'web 2.0-style' applications, such as Gmail or Google Maps. Stuff like that would simply be impossible in a web browser without depending on proprietary technology such as Flash. Recent effort in all modern webbrowsers (including IE) has gone mostly into optimizing Javascript engines. Whether you like it or not, Javascript is here to stay.

Of course, this debate shouldn't really be about what people like or dislike in a certain programming language. It should be about what the best option is for Mediawiki template programming. A small script language serves that goal best, so that leaves us to Lua and Javascript. Lua is pretty cool too, but isn't as well known as Javascript, and as far as i know they are pretty similar in most aspects.

-- Hay

William Allen Simpson

11:38 p.m.

Hay (Husky) wrote:

...

I'm sorry that you seem to have such bad experiences with JavaScript. Still, i don't think your comments are really valid in today's world.

You mean like the {{hidden}} template series? How long that took to finally work?

Worse, folks trying to use the classes directly, resulting in the contents being centered, with the bullets and numbering removed: <div class="NavFrame collapsed"> <div class="NavHead">Categories</div> <div class="NavContent">

https://secure.wikimedia.org/wikipedia/en/w/index.php?title=Wikipedia:Catego... or http://en.wikipedia.org/w/index.php?title=Wikipedia:Categories_for_discussio...

Believe me, user edits relying on JS, even where the JS isn't directly accessible, are not really ready for prime time today.

...

Take a look at 'web 2.0-style' applications, such as Gmail or Google Maps. Stuff like that would simply be impossible in a web browser without depending on proprietary technology such as Flash.

Sure, and do you know how many months it took to get that to work, or how many folks from the application security group to review?

Gregory Maxwell

11:27 p.m.

On Wed, Jul 1, 2009 at 11:21 AM, William Allen Simpsonwilliam.allen.simpson@gmail.com wrote:

...

...
...

Doesn't inflate the number of languages used in the operation of the site

This is the important checkbox, as far as integration with the project (my first criterion), but is the server side code already running JavaScript? For serving pages?

No but mediawiki and the sites are already chock-full of client side code in JS.

You basically can't do advanced development for MediaWiki or the wikimedia sites without a degree of familiarity with Javascript due to client compatibility considerations.

...

My general rule: coming over the network, presume it's bad data.

In this case were not talking about the language mediawiki is written in, we're talking about a language used for server-side content automation (templates). In that case we'd be assuming the inputs are toxic just like in the client side case, since everything, including the code itself came in over the network.

I'll concede that there likely wouldn't be much code reuse, but I'd attribute that more to the starkly different purpose and the fact that the server version would have a different API (no DOM, but instead functions for pulling data out of mediawiki).

...

And we have far too many examples of existing JS already being used in horrid templates, being promulgated in important areas such as large categories, that don't seem to work consistently, and don't work at all with JavaScript turned off. I run Firefox with JS off by default for all wikimedia sites, because of serious problems in the not so recent past!

Fortunately this is a non-issue here: Better server side scripting enhances the sites ability to operate without requiring scripting on the client.

Steve Sanbeg

2 Jul 2 Jul

1:13 a.m.

On Wed, 01 Jul 2009 09:42:31 +0400, Dmitriy Sintsov wrote:

...

XSLT itself is a way too much locked down - even simple things like substrings manipulation and loops aren't so easy to perform. Well, maybe I am too stupid for XSLT but from my experience bringing tag syntax in programming language make the code poorly readable and bloated. I've used XSLT for just one of my projects.

I'd assume we want locked down. Loops would be hard in any locked-down environment; I don't recall seeing any recommendation in this thread on how that wold be done. Recursion is much simpler, just track the depth, and throw an exception if it goes to deep; emacs lisp already uses this mechanism.

Some of those things may not be as easy as other lanugages, but the string functions that this thread was started over are built into XPath 2.0, so it would solve the problem at hand.

...

Deeply nested braces of lisp remind me of current MediaWiki parser.

Superficially, sure; but IMHO the real problem with the current parser is the ambiguity, that when you see a construct begin like {{{{{something... you need to keep reading before you can parse it. With lisp, it's trivial to parse, so we could do our own parsing if needed.

...

Lua was highly valued here at computer lab, also Ocaml (not sure of proper spelling). Dmitriy

It seems like there are benefits there, but it's less clear how to implement that sufficiently locked down, and how that would interface with the rest of the parser, for callbacks, magic words, etc.

Tim Starling

3 Jul 3 Jul

12:43 p.m.

Steve Sanbeg wrote:

...

I'd assume we want locked down. Loops would be hard in any locked-down environment; I don't recall seeing any recommendation in this thread on how that wold be done. Recursion is much simpler, just track the depth, and throw an exception if it goes to deep; emacs lisp already uses this mechanism.

Loops are essential for readable code. There is no problem with allowing loops in conjunction with time limits, that we don't have already with complex templates. In fact, time limits for complex templates would be an improvement over the system of expansion limits we have at the moment.

Recursion can give a long running time even if the depth is limited. By calling the function multiple times from its own body, you can have exponential time order in the recursion depth.

-- Tim Starling

Aryeh Gregor

10:24 p.m.

On Fri, Jul 3, 2009 at 3:13 AM, Tim Starlingtstarling@wikimedia.org wrote:

...

Loops are essential for readable code. There is no problem with allowing loops in conjunction with time limits, that we don't have already with complex templates. In fact, time limits for complex templates would be an improvement over the system of expansion limits we have at the moment.

But time limits are inconsistent. Whether a template hits the limit might depend on whether it happens to be running on an Apache with a Pentium IV, an Opteron, a Xeon, . . .

...

Recursion can give a long running time even if the depth is limited. By calling the function multiple times from its own body, you can have exponential time order in the recursion depth.

You can also have exponential time with loops.

Petr Kadlec

11:07 p.m.

2009/7/3 Aryeh Gregor Simetrical+wikilist@gmail.com:

...

But time limits are inconsistent. Whether a template hits the limit might depend on whether it happens to be running on an Apache with a Pentium IV, an Opteron, a Xeon, . . .

And they might depend on the server load at the time, which is especially problematic if the script produces page rendering output, which gets cached. A temporary server overload might cause caching of broken pages (which render fine otherwise). Some invariant measure (like instruction count etc.) would be great (and much more complicated, unless we would use our own interpreter of the respective scripting language).

-- [[cs:User:Mormegil | Petr Kadlec]]

Aryeh Gregor

4 Jul 4 Jul

12:46 a.m.

On Fri, Jul 3, 2009 at 1:37 PM, Petr Kadlecpetr.kadlec@gmail.com wrote:

...

And they might depend on the server load at the time

Probably not too much, if you count user+system instead of real time. But yes, that could be an issue too (more context switches, etc.).

Tim Starling

8 Jul 8 Jul

5:14 p.m.

Aryeh Gregor wrote:

...

On Fri, Jul 3, 2009 at 3:13 AM, Tim Starlingtstarling@wikimedia.org wrote:

...
Loops are essential for readable code. There is no problem with allowing loops in conjunction with time limits, that we don't have already with complex templates. In fact, time limits for complex templates would be an improvement over the system of expansion limits we have at the moment.

But time limits are inconsistent. Whether a template hits the limit might depend on whether it happens to be running on an Apache with a Pentium IV, an Opteron, a Xeon, . . .

That's the reason I went with expansion limits when I wrote the code. But I think it was the wrong choice, because the code is complex and there are lots of ways to run over the 30s time limit set in php.ini, or to exceed the memory limit, even with the expansion limits in place. It's hard to find all the potential performance problems during code review, especially when new parser functions are constantly added.

I didn't say either method was perfect, just that time limits are better.

...

...
Recursion can give a long running time even if the depth is limited. By calling the function multiple times from its own body, you can have exponential time order in the recursion depth.

You can also have exponential time with loops.

Without the time limit, the worst case running time for a JavaScript script is infinity with finite input, so the time order is O(∞). With the time limit, it's O(1). That's the whole point, a time limit lets you ignore algorithmic complexities.

If you measure script execution times, instead of trying to guess them in advance, then you can concentrate developer effort on quotas, access control, profiling tools, etc., which I think are more tractable problems than analysing the performance every possible thing the parser can do and limiting it in advance.

-- Tim Starling

Tei

6:05 p.m.

Another idea thrown against a wall:

can template scripts be "pre-calculated" ? I think most people talk about script interpreted "on-demand". But what If scripts are updated every 100 uses, and in a async way. That way a script that take 10 min to finish is not a problem. It will be updated every 10 min (or every 1 hour, if the servers want that).

I mean, have "outdated" templates, that only updated wen the server can, and not everytime the server use that template.

note: I have no idea if this message is more noise to make the raise/signal worst. Sorry if is that.

-- -- ℱin del ℳensaje.

Platonides

9 Jul 9 Jul

1:06 p.m.

Tei wrote:

...

Another idea thrown against a wall:

can template scripts be "pre-calculated" ? I think most people talk about script interpreted "on-demand". But what If scripts are updated every 100 uses, and in a async way. That way a script that take 10 min to finish is not a problem. It will be updated every 10 min (or every 1 hour, if the servers want that).

I mean, have "outdated" templates, that only updated wen the server can, and not everytime the server use that template.

note: I have no idea if this message is more noise to make the raise/signal worst. Sorry if is that.

That's pretty much what mediawiki does. The template is only parsed when article is modified. With the template calculating the age of the subject you need to calculate it every time (or cache the output for each template with each set of parameters). When you modify the template, if i has many uses, the pages using it will used the cached version of the page until the job queue reaches to update it.

Steve Sanbeg

6 Jul 6 Jul

10:16 p.m.

On Fri, 03 Jul 2009 17:13:45 +1000, Tim Starling wrote:

...

Steve Sanbeg wrote:

...
I'd assume we want locked down. Loops would be hard in any locked-down environment; I don't recall seeing any recommendation in this thread on how that wold be done. Recursion is much simpler, just track the depth, and throw an exception if it goes to deep; emacs lisp already uses this mechanism.

Loops are essential for readable code. There is no problem with allowing loops in conjunction with time limits, that we don't have already with complex templates. In fact, time limits for complex templates would be an improvement over the system of expansion limits we have at the moment.

In some cases they would be helpful; i.e. to loop over all of the template arguments instead of those horrible #switch things they use now. But letting people run out the clock with arbitrarily complex loops seems messy.

On the one hand, anyone can easily write code to take up the maximum alloted time, and stuff as many into a page as they could, to either prevent the page from rendering at all, or cause the system to stop executing code before it gets to the parts that are supposed to be there.

On the other hand, it could make templates fail unpredictably; with a seemingly small change having just enough affect on execution time for the template to fail, at least some of the time.

...

Recursion can give a long running time even if the depth is limited. By calling the function multiple times from its own body, you can have exponential time order in the recursion depth.

All those calls still end up on the same stack; even if it could be a tree in theory, the stack only grows one way, and execution time would only be linear.

I found some documentation on the example I'd thought of emulating, which may clarify a little:

http://www.delorie.com/gnu/docs/elisp-manual-21/elisp_123.html

This variable defines the maximum depth allowed in calls to eval, apply, and funcall before an error is signaled (with error message "Lisp nesting exceeds max-lisp-eval-depth"). This limit, with the associated error when it is exceeded, is one way that Lisp avoids infinite recursion on an ill-defined function.

The depth limit counts internal uses of eval, apply, and funcall, such as for calling the functions mentioned in Lisp expressions, and recursive evaluation of function call arguments and function body forms, as well as explicit calls in Lisp code.

Tim Starling

9 Jul 9 Jul

9:07 a.m.

Steve Sanbeg wrote:

...

On Fri, 03 Jul 2009 17:13:45 +1000, Tim Starling wrote:

...
Recursion can give a long running time even if the depth is limited. By calling the function multiple times from its own body, you can have exponential time order in the recursion depth.

All those calls still end up on the same stack; even if it could be a tree in theory, the stack only grows one way, and execution time would only be linear.

That's an interesting theory.

...

I found some documentation on the example I'd thought of emulating, which may clarify a little:

http://www.delorie.com/gnu/docs/elisp-manual-21/elisp_123.html

I thought I would try it.

(defun pow5 (n) (if (= n 0) 1 (+ (pow5 (1- n)) (pow5 (1- n)) (pow5 (1- n)) (pow5 (1- n)) (pow5 (1- n)) ) ) )

It calculates 5 to the power of n by adding 1+1+1+1+1... I found that with a stack depth limit of 25, I was able to calculate 5^6 = 15625. That's plainly not an O(N) execution time in stack depth.

-- Tim Starling

Aryeh Gregor

1 Jul 1 Jul

1:26 a.m.

On Tue, Jun 30, 2009 at 12:16 PM, Brion Vibberbrion@wikimedia.org wrote:

...

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

I think it would be easy to provide a very simple locked-down version, with most of the features gone. You could, for instance, only permit variable assignment, use of built-in operators, a small whitelist of functions, and conditionals. You could omit loops, function definitions, and abusable functions like str_repeat() (let alone exec(), eval(), etc.) from a first pass. This would still be vastly more powerful, more readable, and faster than ParserFunctions.

Hopefully, we could make this secure enough for your average shared-host website to run it by default with no special measures taken and without much risk. Installations with more access and higher security requirements, like Wikimedia, could shell out to a process that's sandboxed on the OS level to be on the safe side. I'd like to hear what Tim thinks about the possibility of securing PHP like this.

Of course, PHP is evil, and supporting it sucks. :( But if we *really* *really* need to support users who can't shell out to other programs, I think it's the only real language that's a feasible solution.

I'd encourage you to consider requiring exec() support for full use of Wikipedia templates, though. Many really big shared hosts allow it, like 1and1.com. Anyone big enough to include much Wikipedia content will likely be on at least a VPS anyway. And if your host doesn't support exec(), then at *worst* you can still get the articles in a totally usable form -- just run Special:ExpandTemplates on all the article's templates. You can then transclude those on a per-article basis; we could update Special:Export to make this easier. The only problem in this case would be that you can't easily change the formatting of all the templates at once -- but such a small site would likely have few enough articles to do it by hand, if they even want to.

I think saying that users without exec() support get to use Wikipedia content in a somewhat less usable form would be just fine, and it would *really* open up our options. We could support basically any programming language in that case.

...

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

It doesn't matter whether it's present, does it? If the user has exec() support, they could download a binary interpreter for *any* language to their webspace and run it from there regardless of whether the language is supported on the host. So Python is on exactly the same level as Lua here.

Much though I love Python, Lua looks like the better option. First of all, it's *very* small. sudo apt-get install lua50 on my machine uses up only 180 KB of disk space, and the package is 30 KB gzipped. Our current tarballs are 10 MB; we could easily just chuck in Lua binaries for Linux x86-32 and Windows without even noticing the size increase, and allow users to enable it with one line in LocalSettings.php. By contrast, python2.6 is around 10 MB uncompressed, 2.5 MB compressed. Perl is twice that size. Windows users, or users with exec() allowed but open_basedir preventing access to /usr/bin, would have to obtain Python/Perl/etc. themselves.

It looks to me like Lua would be a lot easier to sandbox. It seems pretty simple to deny all I/O within the language itself, so you'd (hopefully) just need memory and CPU limits. Both of those could be implemented on Linux with hard setrlimit() values plus nice. Similar things exist on Windows, hopefully accessible by command line somehow. If we're shipping binaries with MediaWiki, we could even hack the code if necessary, to use whatever sandboxing mechanisms the OS makes available, although hopefully that would be unneeded.

I don't think we should fixate too much on how many people know the language. It's not hard to pick up a new language if you already know one, and Lua has the reputation of being simple (although I haven't tried to learn it). I think Lua is the best option here.

Brion Vibber

1:55 a.m.

Aryeh Gregor wrote:

...

On Tue, Jun 30, 2009 at 12:16 PM, Brion Vibberbrion@wikimedia.org wrote:

...

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

I think it would be easy to provide a very simple locked-down version, with most of the features gone. You could, for instance, only permit variable assignment, use of built-in operators, a small whitelist of functions, and conditionals. You could omit loops, function definitions, and abusable functions like str_repeat() (let alone exec(), eval(), etc.) from a first pass. This would still be vastly more powerful, more readable, and faster than ParserFunctions.

IMO by the time you've implemented your whitelisting parser you might as well just interpret it rather than eval()ing. (And of course, eval() might be disabled on the server. :)

Looping constructs are also extremely valuable -- at a minimum in a foreach() kind of way.

...

I'd encourage you to consider requiring exec() support for full use of Wikipedia templates, though. Many really big shared hosts allow it, like 1and1.com. Anyone big enough to include much Wikipedia content will likely be on at least a VPS anyway.

It's not about "Wikipedia content", but about being able to grab things you see on another wiki and use or adapt them to your own needs. We get lots of questions from people trying to grab some particular template off Wikipedia to use on their own site for their own needs.

...

...

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

It doesn't matter whether it's present, does it? If the user has exec() support, they could download a binary interpreter for *any* language to their webspace and run it from there regardless of whether the language is supported on the host.

Considering the amount of trouble people have getting texvc working, I wouldn't want to force that on people just to use templates.

...

Much though I love Python, Lua looks like the better option. First of all, it's *very* small. sudo apt-get install lua50 on my machine uses up only 180 KB of disk space, and the package is 30 KB gzipped.

Python "comes with batteries included", which is to say it's got a huge standard library (most of which of course wouldn't be available in a restricted environment). Lua's bare interpreter of course wins in an embedded-shipping contest. :D

...

Our current tarballs are 10 MB; we could easily just chuck in Lua binaries for Linux x86-32 and Windows without even noticing the size increase, and allow users to enable it with one line in LocalSettings.php.

Hmm... it might be interesting to experiment with something like this, if it can _really_ be compiled standalone. (Linux binary distribution is a hellhole of incompatible linked library versions!)

...

It looks to me like Lua would be a lot easier to sandbox. It seems pretty simple to deny all I/O within the language itself, so you'd (hopefully) just need memory and CPU limits.

*nod* being designed as an embedded language is a win. :D

-- brion

Marco Schuster

2:03 a.m.

On Tue, Jun 30, 2009 at 10:25 PM, Brion Vibber brion@wikimedia.org wrote:

...

Aryeh Gregor wrote:

...
Our current tarballs are 10 MB; we could easily just chuck in Lua binaries for Linux x86-32 and Windows without even noticing the size increase, and allow users to enable it with one line in LocalSettings.php.

Hmm... it might be interesting to experiment with something like this, if it can _really_ be compiled standalone. (Linux binary distribution is a hellhole of incompatible linked library versions!)

Static compiling the stuff? How would this affect the binary size? (And: is static linking working across different libc versions?)

BTW, what about Mac OS / FreeBSD hosts?

Marco

-- VMSoft GbR Nabburger Str. 15 81737 München Geschäftsführer: Marco Schuster, Volker Hemmert http://vmsoft-gbr.de

Aryeh Gregor

2:15 a.m.

On Tue, Jun 30, 2009 at 4:33 PM, Marco Schustermarco@harddisk.is-a-geek.org wrote:

...

Static compiling the stuff? How would this affect the binary size?

Hopefully not too badly if you use the right options. libc is huge, but the linker should be able to throw out most of it if statically linking, since Lua likely doesn't use most libc functions.

Alternatively, is the libc ABI stable enough that we could dynamically link libc, and statically link everything else? The other libraries required are very small.

...

(And: is static linking working across different libc versions?)

Yes, it should work fine, AFAIK. If you statically link everything you're just using the kernel ABIs, which are supposed to be very stable (especially for reasonably common stuff).

...

BTW, what about Mac OS / FreeBSD hosts?

Are there any shared webhosts you know of that run Mac or BSD? At worst, they can fall into the same group as the no-exec() camp, able to use Wikipedia content but not 100%.

Marco Schuster

2:41 a.m.

On Tue, Jun 30, 2009 at 10:45 PM, Aryeh Gregor < Simetrical+wikilist@gmail.com Simetrical%2Bwikilist@gmail.com> wrote:

...

Alternatively, is the libc ABI stable enough that we could dynamically link libc, and statically link everything else? The other libraries required are very small.

I wouldn't count on this... at least we should provide a dyn-linked version for those wanting less storage/memory/whatever consumption.

How do statically compiled programs for x86 platforms behave on x64, btw? And what about more "exotic" platforms like ARM (which can also be multi-endian, IXP4xx is an example) / SPARC (Toolserver!!!) or PowerPC? Are they actually supported by Lua?

...

...
BTW, what about Mac OS / FreeBSD hosts?

Are there any shared webhosts you know of that run Mac or BSD? At worst, they can fall into the same group as the no-exec() camp, able to use Wikipedia content but not 100%.

The webhoster hosting our school's homepage does, for example... They host all schools in Munich, and I think they're a bit security-paranoid. We don't have any issues hosting a MediaWiki there, actually. (OK, we never imported WP content.)

Marco

-- VMSoft GbR Nabburger Str. 15 81737 München Geschäftsführer: Marco Schuster, Volker Hemmert http://vmsoft-gbr.de

Aryeh Gregor

2:50 a.m.

On Tue, Jun 30, 2009 at 5:11 PM, Marco Schustermarco@harddisk.is-a-geek.org wrote:

...

How do statically compiled programs for x86 platforms behave on x64, btw?

I'm pretty sure they work fine. Someone with more knowledge of Linux binaries needs to comment on how we could best do this, though.

...

And what about more "exotic" platforms like ARM (which can also be multi-endian, IXP4xx is an example) / SPARC (Toolserver!!!) or PowerPC? Are they actually supported by Lua?

Lua is designed to be extremely portable IIRC, across both architectures and compilers.

...

The webhoster hosting our school's homepage does, for example... They host all schools in Munich, and I think they're a bit security-paranoid.

That's not a shared host. They can easily install Lua themselves.

Aryeh Gregor

2:12 a.m.

On Tue, Jun 30, 2009 at 4:25 PM, Brion Vibberbrion@wikimedia.org wrote:

...

IMO by the time you've implemented your whitelisting parser you might as well just interpret it rather than eval()ing.

I don't think so. You'd only have to do the whitelisting once, on page save. After that you could just execute with no extra overhead. Even better, you could write it to a file and include() the file; this would be a huge win if you have an opcode cache. Of course, parsing PHP within PHP should be much easier than parsing another language within PHP: just use token_get_all() to do most of the work.

...

(And of course, eval() might be disabled on the server. :)

Does anyone actually do this? It would break a lot of major web apps, surely. If anyone does do this, it would still work if you could write to a file and then include it.

...

Looping constructs are also extremely valuable -- at a minimum in a foreach() kind of way.

Right, but we could live without them in an initial version. They could be added later.

...

It's not about "Wikipedia content", but about being able to grab things you see on another wiki and use or adapt them to your own needs. We get lots of questions from people trying to grab some particular template off Wikipedia to use on their own site for their own needs.

Sure. The point still holds. Some third parties would be unable to use Wikipedia templates, yes. But given the tangle of dependencies the major ones have, and how complicated they are, I'm guessing most small third-party wikis don't bother in the end anyway. Requiring exec() for full use of content is viable IMO.

...

Considering the amount of trouble people have getting texvc working, I wouldn't want to force that on people just to use templates.

The problem with texvc is installing dependencies and compiling it. A much better analogy is things like diff3 -- which we shell out to out of the box, with zero configuration, if they exist and shelling out works. (We'd probably want scripting off by default, of course, but we could require just a single config line.)

...

Python "comes with batteries included", which is to say it's got a huge standard library (most of which of course wouldn't be available in a restricted environment). Lua's bare interpreter of course wins in an embedded-shipping contest. :D

Yep, but that's a big advantage. It means Windows users don't have to do any extra work. It also lets us ensure a specific version is reliably available. Imagine Wikimedia using Python 2.6, and someone trying to run that on some shared host running Fedora 8 or God knows what, with Python 2.2 or something. (Someone actually came into #mediawiki a few months ago for help and it turned out their VPS was something like Fedora 7 or 8. And horribly overpriced at that!)

...

Hmm... it might be interesting to experiment with something like this, if it can _really_ be compiled standalone. (Linux binary distribution is a hellhole of incompatible linked library versions!)

I hadn't thought of libraries, you're right. It should work pretty reliably on Linux (and hopefully not be too much bigger) if it's statically linked, though, right?

Andrew Garrett

4:46 a.m.

On 30/06/2009, at 9:42 PM, Aryeh Gregor wrote:

...

On Tue, Jun 30, 2009 at 4:25 PM, Brion Vibberbrion@wikimedia.org wrote:

...
IMO by the time you've implemented your whitelisting parser you might as well just interpret it rather than eval()ing.

I don't think so. You'd only have to do the whitelisting once, on page save. After that you could just execute with no extra overhead.

That's just scary. We'd definitely want to do the validation as close as possible to the actual eval()ing, to minimise backdoors like Special:Import et al.

-- Andrew Garrett Contract Developer, Wikimedia Foundation agarrett@wikimedia.org http://werdn.us

Brion Vibber

6:07 a.m.

Andrew Garrett wrote:

...

On 30/06/2009, at 9:42 PM, Aryeh Gregor wrote:

...
On Tue, Jun 30, 2009 at 4:25 PM, Brion Vibberbrion@wikimedia.org wrote:

...
IMO by the time you've implemented your whitelisting parser you might as well just interpret it rather than eval()ing.

I don't think so. You'd only have to do the whitelisting once, on page save. After that you could just execute with no extra overhead.

That's just scary. We'd definitely want to do the validation as close as possible to the actual eval()ing, to minimise backdoors like Special:Import et al.

Executing PHP from apache-writable files saved on disk is also a security danger.

The original implementation of the MonoBook skin used the TAL templating language, which was compiled into executable PHP at runtime and stored in /tmp so it could be cached for the next view.

In addition to difficulties with hosts which had misconfigured /tmp directories, we found that people sharing their hosts with poorly-secured WordPress installations would end up finding their wikis hacked -- worms exploiting vulnerabilities in other PHP apps would hop around the system modifying any .php files they could write to... including the cached PHPTAL templates.

-- brion

Brion Vibber

6:03 a.m.

Aryeh Gregor wrote:

...

On Tue, Jun 30, 2009 at 4:25 PM, Brion Vibberbrion@wikimedia.org wrote:

...
It's not about "Wikipedia content", but about being able to grab things you see on another wiki and use or adapt them to your own needs. We get lots of questions from people trying to grab some particular template off Wikipedia to use on their own site for their own needs.

Sure. The point still holds. Some third parties would be unable to use Wikipedia templates, yes. But given the tangle of dependencies the major ones have, and how complicated they are, I'm guessing most small third-party wikis don't bother in the end anyway.

That's why we want to fix it! :)

It *should* be fairly trivial to fetch a template/plugin sort of thing off of one wiki and put it on another. Consider this as one of our goals for next-gen templating.

-- brion

Jared Williams

2:22 a.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Aryeh Gregor Sent: 30 June 2009 20:56 To: Wikimedia developers Subject: Re: [Wikitech-l] On templates and programming languages

On Tue, Jun 30, 2009 at 12:16 PM, Brion Vibberbrion@wikimedia.org wrote:

...

PHP

Advantage: Lots of webbish people have some experience with

PHP or can

...
easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

I think it would be easy to provide a very simple locked-down version, with most of the features gone. You could, for instance, only permit variable assignment, use of built-in operators, a small whitelist of functions, and conditionals. You could omit loops, function definitions, and abusable functions like str_repeat() (let alone exec(), eval(), etc.) from a first pass. This would still be vastly more powerful, more readable, and faster than ParserFunctions.

Pity there is not a method of locking down code execution to a single namespace, (think ahead with php5.3)

namespace Template { function strlen($string) { return \strlen($string) * 2; } function exec() { throw new \Exception(); }

class Template { function paint() { // Redirect \ namespace to Template, so \exec() is also caught.

echo strlen('data'); } } }

Jared

Aryeh Gregor

2:46 a.m.

On Tue, Jun 30, 2009 at 4:52 PM, Jared Williamsjared.williams1@ntlworld.com wrote:

...

Pity there is not a method of locking down code execution to a single namespace, (think ahead with php5.3)

This is implausible, but even if it happened it wouldn't stop trivial DOSes like while (true);. We'd still need to validate the code if we wanted to run it in-process.

Jared Williams

3:42 a.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Aryeh Gregor Sent: 30 June 2009 22:16 To: Wikimedia developers Subject: Re: [Wikitech-l] On templates and programming languages

On Tue, Jun 30, 2009 at 4:52 PM, Jared Williamsjared.williams1@ntlworld.com wrote:

...
Pity there is not a method of locking down code execution

to a single

...
namespace, (think ahead with php5.3)

This is implausible, but even if it happened it wouldn't stop trivial DOSes like while (true);. We'd still need to validate the code if we wanted to run it in-process.

Yeah, would also need time & mem use restrictions.

Jared

Robert Rohde

3:38 a.m.

On Tue, Jun 30, 2009 at 12:56 PM, Aryeh GregorSimetrical+wikilist@gmail.com wrote:

...

On Tue, Jun 30, 2009 at 12:16 PM, Brion Vibberbrion@wikimedia.org wrote:

...

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

I think it would be easy to provide a very simple locked-down version, with most of the features gone. You could, for instance, only permit variable assignment, use of built-in operators, a small whitelist of functions, and conditionals. You could omit loops, function definitions, and abusable functions like str_repeat() (let alone exec(), eval(), etc.) from a first pass. This would still be vastly more powerful, more readable, and faster than ParserFunctions.

Hopefully, we could make this secure enough for your average shared-host website to run it by default with no special measures taken and without much risk. Installations with more access and higher security requirements, like Wikimedia, could shell out to a process that's sandboxed on the OS level to be on the safe side. I'd like to hear what Tim thinks about the possibility of securing PHP like this.

Of course, PHP is evil, and supporting it sucks. :( But if we *really* *really* need to support users who can't shell out to other programs, I think it's the only real language that's a feasible solution.

I'd encourage you to consider requiring exec() support for full use of Wikipedia templates, though. Many really big shared hosts allow it, like 1and1.com. Anyone big enough to include much Wikipedia content will likely be on at least a VPS anyway. And if your host doesn't support exec(), then at *worst* you can still get the articles in a totally usable form -- just run Special:ExpandTemplates on all the article's templates. You can then transclude those on a per-article basis; we could update Special:Export to make this easier. The only problem in this case would be that you can't easily change the formatting of all the templates at once -- but such a small site would likely have few enough articles to do it by hand, if they even want to.

I think saying that users without exec() support get to use Wikipedia content in a somewhat less usable form would be just fine, and it would *really* open up our options. We could support basically any programming language in that case.

...

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

It doesn't matter whether it's present, does it? If the user has exec() support, they could download a binary interpreter for *any* language to their webspace and run it from there regardless of whether the language is supported on the host. So Python is on exactly the same level as Lua here.

Much though I love Python, Lua looks like the better option. First of all, it's *very* small. sudo apt-get install lua50 on my machine uses up only 180 KB of disk space, and the package is 30 KB gzipped. Our current tarballs are 10 MB; we could easily just chuck in Lua binaries for Linux x86-32 and Windows without even noticing the size increase, and allow users to enable it with one line in LocalSettings.php. By contrast, python2.6 is around 10 MB uncompressed, 2.5 MB compressed. Perl is twice that size. Windows users, or users with exec() allowed but open_basedir preventing access to /usr/bin, would have to obtain Python/Perl/etc. themselves.

It looks to me like Lua would be a lot easier to sandbox. It seems pretty simple to deny all I/O within the language itself, so you'd (hopefully) just need memory and CPU limits. Both of those could be implemented on Linux with hard setrlimit() values plus nice. Similar things exist on Windows, hopefully accessible by command line somehow. If we're shipping binaries with MediaWiki, we could even hack the code if necessary, to use whatever sandboxing mechanisms the OS makes available, although hopefully that would be unneeded.

I don't think we should fixate too much on how many people know the language. It's not hard to pick up a new language if you already know one, and Lua has the reputation of being simple (although I haven't tried to learn it). I think Lua is the best option here.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

In addition to resource limits, any scheme better make sure what's passed into the programming language and what's passed out makes sense. For example, you shouldn't have it generating raw HTML and probably shouldn't let it mess with strip markers. Some of this may be automatic depending how it's integrated into the parser. One would probably also want to limit the size of an allowed output (e.g. don't let it send 5 MB to the user). Depending on the integration there may be other control sequences that one needs to catch when it returns as well.

On a separate point, one of the limitations of stand-alone type sandboxes is that it would make it harder for the code to call other template pages. One of the few virtues of the current template code is that it is relatively modular, with more complex templates being built out of less complex ones. If this programming language is meant to replace that then it would also need to be able to reference the results of other template pages. One solution is to pre-expand those sections (similar to what is done now, I believe), but that can get rather delicate once one has programming constructs like variable assignments, looping, and recursion since the template parameters won't necessarily be fixed at the Preprocessor stage.

-Robert Rohde

Aryeh Gregor

7:03 a.m.

On Tue, Jun 30, 2009 at 6:08 PM, Robert Rohderarohde@gmail.com wrote:

...

In addition to resource limits, any scheme better make sure what's passed into the programming language and what's passed out makes sense. For example, you shouldn't have it generating raw HTML and probably shouldn't let it mess with strip markers. Some of this may be automatic depending how it's integrated into the parser. One would probably also want to limit the size of an allowed output (e.g. don't let it send 5 MB to the user). Depending on the integration there may be other control sequences that one needs to catch when it returns as well.

I was assuming it would just return wikitext, and that would be integrated into the page and parsed, following all limits on wikitext (including size) -- just as with current parser functions.

...

On a separate point, one of the limitations of stand-alone type sandboxes is that it would make it harder for the code to call other template pages. One of the few virtues of the current template code is that it is relatively modular, with more complex templates being built out of less complex ones. If this programming language is meant to replace that then it would also need to be able to reference the results of other template pages. One solution is to pre-expand those sections (similar to what is done now, I believe), but that can get rather delicate once one has programming constructs like variable assignments, looping, and recursion since the template parameters won't necessarily be fixed at the Preprocessor stage.

I'd assume we'd support some kind of includes. One rudimentary way to do it would be to run Lua stuff after or during preprocessing, so you could just include Lua code macro-style using templates. A better way would probably be to support the include features of the language itself (I don't know how they work offhand, for Lua).

On Tue, Jun 30, 2009 at 6:12 PM, Jared Williamsjared.williams1@ntlworld.com wrote:

...

Yeah, would also need time & mem use restrictions.

Which is impossible for in-process use. You'd have to shell out if you do that, which defeats the entire point of using PHP instead of something else to begin with.

On Tue, Jun 30, 2009 at 7:16 PM, Andrew Garrettagarrett@wikimedia.org wrote:

...

That's just scary. We'd definitely want to do the validation as close as possible to the actual eval()ing, to minimise backdoors like Special:Import et al.

You'd be saving the code to a file on disk somewhere, probably named using a hash of the input. The only thing saving the code would be the code that sanitizes it. There's no way anything could go wrong unless an attacker gains filesystem write access, in which case you're hosed anyway. Parsing PHP on every page view when you could cache it in APC is crazy.

On Tue, Jun 30, 2009 at 7:24 PM, Hay (Husky)huskyr@gmail.com wrote:

...

That leaves us to Lua and Javascript, which are both small and efficient languages meant to solve tasks like this. Remember, i'm talking about 'core' Javascript here, not with all DOM methods and stuff. If you strip that all out (take a look at the 1.5. core reference at Mozilla.com: https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference) you get a pretty nice and simple language that isn't very large. Both would require a new parser and/or installed compilers on the server-side. Compared to the disadvantages of other options, that seems like a pretty small loss for a great win.

Reasonable enough, yeah. Sandboxing might easier too. What are some standalone JavaScript interpreters we could use? Ideally we'd use a heavily-optimized JIT compiler, like V8 or TraceMonkey, but I don't know if those work standalone.

On Tue, Jun 30, 2009 at 8:33 PM, Brion Vibberbrion@wikimedia.org wrote:

...

That's why we want to fix it! :)

It *should* be fairly trivial to fetch a template/plugin sort of thing off of one wiki and put it on another. Consider this as one of our goals for next-gen templating.

Eh. Then that really ties our hands. If we have to have support for shared hosts without exec() support, then I don't see any viable option except sanitized PHP.

On Tue, Jun 30, 2009 at 8:37 PM, Brion Vibberbrion@wikimedia.org wrote:

...

Executing PHP from apache-writable files saved on disk is also a security danger.

The original implementation of the MonoBook skin used the TAL templating language, which was compiled into executable PHP at runtime and stored in /tmp so it could be cached for the next view.

In addition to difficulties with hosts which had misconfigured /tmp directories, we found that people sharing their hosts with poorly-secured WordPress installations would end up finding their wikis hacked -- worms exploiting vulnerabilities in other PHP apps would hop around the system modifying any .php files they could write to... including the cached PHPTAL templates.

It could be eval()ed by default, but the performance wins from using APC would surely be huge. If you set it up carefully it should be safe enough.

On Tue, Jun 30, 2009 at 8:41 PM, BrianBrian.Mingus@colorado.edu wrote:

...

There is nothing in the OP that indicates that we are keeping the current template code or even that it would be desirable. Whatever facilities the language we choose has for including other files and passing arguments to functions is 100% sufficient.

We're talking about changing how templates are written, not how they're called. Changing the template call syntax is an entirely different discussion that's orthogonal to this one.

On Tue, Jun 30, 2009 at 9:02 PM, Trevor Parscaltparscal@wikimedia.org wrote:

...

Seems like JSON syntax is pretty simple and could be a big improvement to how templates are currently invoked.

I'm not sure where you'd use JSON here?

Brion Vibber

2 Jul 2 Jul

10:48 p.m.

Aryeh Gregor wrote:

...

I was assuming it would just return wikitext, and that would be integrated into the page and parsed, following all limits on wikitext (including size) -- just as with current parser functions.

That's one simple way to implement, but we may wish to consider working with a document tree structure instead to help future-proof it against future syntax changes (or dropping out the wiki syntax entirely). Things to consider... :)

-- brion

Dmitriy Sintsov

3 Jul 3 Jul

12:17 p.m.

* Brion Vibber brion@wikimedia.org [Thu, 02 Jul 2009 10:18:14 -0700]:

...

Aryeh Gregor wrote:

...
I was assuming it would just return wikitext, and that would be integrated into the page and parsed, following all limits on

wikitext

...

...
(including size) -- just as with current parser functions.

That's one simple way to implement, but we may wish to consider

working

...

with a document tree structure instead to help future-proof it against future syntax changes (or dropping out the wiki syntax entirely).

Things

...

to consider... :)

SLAX http://code.google.com/p/libslax/ (provided by Gregory Maxwell) looks like really good thing for document tree manipulation and as the people have pointed out, XSLT is simle to limit (lock the recursion down). It's compact and more easily readable comparing to "normal" xslt. I remember that PHP has some standard module for XSLT transformations, I wonder whether it's simple to convert SLAX->XSLT then use PHP XSLT transformation. Dmitriy

Jared Williams

4:11 p.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Dmitriy Sintsov Sent: 03 July 2009 07:48 To: Wikimedia developers Subject: Re: [Wikitech-l] On templates and programming languages

Brion Vibber brion@wikimedia.org [Thu, 02 Jul 2009

10:18:14 -0700]:

...
Aryeh Gregor wrote:

...
I was assuming it would just return wikitext, and that would be integrated into the page and parsed, following all limits on

wikitext

...
...
(including size) -- just as with current parser functions.

That's one simple way to implement, but we may wish to consider

working

...
with a document tree structure instead to help future-proof

it against

...
future syntax changes (or dropping out the wiki syntax entirely).

Things

...
to consider... :)

SLAX http://code.google.com/p/libslax/ (provided by Gregory Maxwell) looks like really good thing for document tree manipulation and as the people have pointed out, XSLT is simle to limit (lock the recursion down). It's compact and more easily readable comparing to "normal" xslt. I remember that PHP has some standard module for XSLT transformations, I wonder whether it's simple to convert SLAX->XSLT then use PHP XSLT transformation. Dmitriy

Think something like ESI language (http://www.w3.org/TR/esi-lang but without the HTTP requests) would be preferable to XSLT. If was going the XML route.

Jared

Michael Daly

1 Jul 1 Jul

1:34 a.m.

Brion Vibber wrote:

...

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter?

Rather than reinventing the wheel, why not look at fixing the existing template syntax?

The biggest problem that I see is the excessive dependence on the parentheses { and }. In a moderately complex template, you've got a mix of double {{...}} and triple {{{...}}} brackets, occasionally nested, that result in an unreadable mess.

If {{{xxx}}} was replaced with a local-variable-like syntax, say $xxx (where xxx is whatever name you wish, $1, $2... for unnamed), then the mess is reduced from something like:

{{blah|{{{xxx}}}|{{{yyy}}}}}{{#if: {{{ggg}}}|{{{h}}}|{{{4}}}}}{{{5}}}

becomes:

{{blah|$xxx|$yyy}}{{#if: $ggg|$h|$4}}$5

which is somewhat more tolerable. (whether or not the above makes real sense is not my objective - I'm just trying to show how removing the blizzard of {{{}}} reduces visual clutter).

Since $ doesn't have a close, that makes things like {{{xxx|default value}}} slightly problematic, since "$xxx|$default_value" is slightly more awkward to parse. But that only shows how templates are also overly reliant on the pipe (|) symbol - as anyone who has tried to use tables in templates has discovered.

If parsing templates allows the semi-restricted use of a couple of symbols (unlike parsing other pages - I know... don't go there), then both {{{}}} and | could be replaced with $ and I-don't-care-what-make-a-choice. Then templates become a tad more readable and we get rid of kludges like {{!}} and other clutter or confusion in tables, parser functions, etc.

As an aside - obliging template writers to declare variables used in the template, say, as a definition of the input format at the top of the template definition, would make parsing the variables out later a tad easier. If it's declared, it's a variable; if not, it's not a variable and is treated as plain text. Thus the first line of a template would be the example of its use:

Template:foobar ---------------------------------------------------------------------- {{Foobar|$var1|$var2|$andAnotherVar}} ...(implementation)... ----------------------------------------------------------------------

But what do I know, I've only implemented one OO language compiler in my life and that was 20 years ago.

Mike

Thomas Dalton

2:08 a.m.

2009/6/30 Michael Daly michael.daly@kayakwiki.org:

...

Brion Vibber wrote: > Any thoughts? Does anybody happen to have a PHP implementation of a > Lua or JavaScript interpreter?

Rather than reinventing the wheel, why not look at fixing the existing template syntax?

I would support that. We really don't need a Turing-complete template system.

...

As an aside - obliging template writers to declare variables used in the template, say, as a definition of the input format at the top of the template definition, would make parsing the variables out later a tad easier. If it's declared, it's a variable; if not, it's not a variable and is treated as plain text. Thus the first line of a template would be the example of its use:

Template:foobar

{{Foobar|$var1|$var2|$andAnotherVar}} ...(implementation)...

How does that work with anonymous variables? Are all $[NUMBER] style names count as auto-declared?

Steve Sanbeg

2:59 a.m.

On Tue, 30 Jun 2009 21:38:07 +0100, Thomas Dalton wrote:

...

2009/6/30 Michael Daly michael.daly@kayakwiki.org:

...

How does that work with anonymous variables? Are all $[NUMBER] style names count as auto-declared?

They're not anonymous, they're just named sequentially. Most languages should have some method of accessing/declaring those, i.e.

XSL: <xsl:param name="1">default</xsl:param> <xsl:value-of select="$1"/>

perl: my $p=$ARG{1}; print $p;

etc...

If we do roll our own, it should have similar fucntionality.

Thomas Dalton

3:23 a.m.

2009/6/30 Steve Sanbeg ssanbeg@ask.com:

...

On Tue, 30 Jun 2009 21:38:07 +0100, Thomas Dalton wrote:

...
2009/6/30 Michael Daly michael.daly@kayakwiki.org:

...
How does that work with anonymous variables? Are all $[NUMBER] style names count as auto-declared?

They're not anonymous, they're just named sequentially.

They are anonymous when you call the template, though. The names are determined by the order in the call rather than written explicitly. They do need to be considered separately.

Steve Sanbeg

2 Jul 2 Jul

1:17 a.m.

On Tue, 30 Jun 2009 22:53:36 +0100, Thomas Dalton wrote:

...

2009/6/30 Steve Sanbeg ssanbeg@ask.com:

...
On Tue, 30 Jun 2009 21:38:07 +0100, Thomas Dalton wrote:

...
2009/6/30 Michael Daly michael.daly@kayakwiki.org:

...
How does that work with anonymous variables? Are all $[NUMBER] style names count as auto-declared?

They're not anonymous, they're just named sequentially.

They are anonymous when you call the template, though. The names are determined by the order in the call rather than written explicitly. They do need to be considered separately.

Anonymous would mean they don't have names, which isn't the case. They are named, but those names may, or may not, be implicit. Currently, they aren't handled separately; the parser names any unnamed arguments prior to calling the template, which has no way of knowing how they were named; to the template, they're all just named arguments {{t|a|b}} is the same as {{t|2=b|1=a}} or even {{t|2=a|b}}.

Michael Daly

1 Jul 1 Jul

7:41 a.m.

Thomas Dalton wrote:

...

...
Thus the first line of a template would be the example of its use:

Template:foobar

{{Foobar|$var1|$var2|$andAnotherVar}} ...(implementation)...

How does that work with anonymous variables? Are all $[NUMBER] style names count as auto-declared?

Template:foobar ---------------------------------------------------------------------- {{Foobar|$1|$2|$3}} ...(implementation)... ----------------------------------------------------------------------

That would make $4 a bit of text. Exactly the same kind of "template prototype" - to borrow C's terminology. I see no reason to have multiple different ways of identifying a variable, nor any reason to have defaults or automatic declarations.

This of course would forbid the use of $n (where n = 1, 2, 3...) as a synonym for a named variable). That is permitted isn't it? I can't remember.

Mike

Tim Landscheidt

2 Jul 2 Jul

3:44 p.m.

Michael Daly michael.daly@kayakwiki.org wrote:

...

[...] Since $ doesn't have a close, that makes things like {{{xxx|default value}}} slightly problematic, since "$xxx|$default_value" is slightly more awkward to parse. But that only shows how templates are also overly reliant on the pipe (|) symbol - as anyone who has tried to use tables in templates has discovered. [...]

bash (don't know if standard POSIX) has:

- ${parameter} - ${parameter:-default} - ${parameter:?error}

and even string functions:

- $#{parameter} - ${parameter:offset:length} - etc.

Personally, whatever programming language would be chosen, I really like Aryeh's approach to sanitize and "compile" the template to PHP. It could be used everywhere MediaWiki runs, it is no hassle to set up compared to installing other in- terpreters and the performance should be the top of what PHP has to offer (and *all* templates could be compiled to that code). From a distance, I think it would even be easier to have the file/memory/CPU restrictions hacked into the main PHP interpreter rather than to cook our own soup.

Tim

Tim Landscheidt

10:56 p.m.

I wrote:

...

[...] Personally, whatever programming language would be chosen, I really like Aryeh's approach to sanitize and "compile" the template to PHP. It could be used everywhere MediaWiki runs, it is no hassle to set up compared to installing other in- terpreters and the performance should be the top of what PHP has to offer (and *all* templates could be compiled to that code). From a distance, I think it would even be easier to have the file/memory/CPU restrictions hacked into the main PHP interpreter rather than to cook our own soup.

Come to think of it, it would also fit very well with pro- filing individual templates.

Tim

Hay (Husky)

1 Jul 1 Jul

4:54 a.m.

I would opt for Javascript.

PHP and Python are intended for large and complex applications and come with a huge standard library people probably expect to be available. Security concerns are a problem too, so a subset would probably be necessary So, in essence you get a crippled-down language that isn't really useful for templates.

Making our own language, either by 'fixing' the template language or by inventing something new would only mean we introduce a new language that'll be specific to our own platform and nobody knows outside of Mediawiki developers.

XSLT is not meant to be written or read by humans. It's a Turing-complete language stuffed into horrendous XML statements. Let's not go down that road.

That leaves us to Lua and Javascript, which are both small and efficient languages meant to solve tasks like this. Remember, i'm talking about 'core' Javascript here, not with all DOM methods and stuff. If you strip that all out (take a look at the 1.5. core reference at Mozilla.com: https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference) you get a pretty nice and simple language that isn't very large. Both would require a new parser and/or installed compilers on the server-side. Compared to the disadvantages of other options, that seems like a pretty small loss for a great win.

Javascript is a widely understood and implemented language, with lots of efforts to get it even faster in modern browsers. Every Wikipedia user has a copy of it implemented in their browser and can start experimenting without the need for installing a compiler or a web server. Many people program in Javascript, so you have a huge potential number of people who could start programming Mediawiki templates. And it's already closely tied to the web, so you don't have to invent new ways of dealing with web-specific stuff.

So, let's choose Javascript as our new template programming language.

Regards, -- Hay

On Tue, Jun 30, 2009 at 6:16 PM, Brion Vibberbrion@wikimedia.org wrote:

...

As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.

An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.

There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid):

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

JavaScript

Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.

Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

-- brion

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Trevor Parscal

5:14 a.m.

I personally agree entirely. Now we just need to revive J4P5 (http://j4p5.sourceforge.net) :)

- Trevor

On 6/30/09 4:24 PM, Hay (Husky) wrote:

...

I would opt for Javascript.

PHP and Python are intended for large and complex applications and come with a huge standard library people probably expect to be available. Security concerns are a problem too, so a subset would probably be necessary So, in essence you get a crippled-down language that isn't really useful for templates.

Making our own language, either by 'fixing' the template language or by inventing something new would only mean we introduce a new language that'll be specific to our own platform and nobody knows outside of Mediawiki developers.

XSLT is not meant to be written or read by humans. It's a Turing-complete language stuffed into horrendous XML statements. Let's not go down that road.

That leaves us to Lua and Javascript, which are both small and efficient languages meant to solve tasks like this. Remember, i'm talking about 'core' Javascript here, not with all DOM methods and stuff. If you strip that all out (take a look at the 1.5. core reference at Mozilla.com: https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference) you get a pretty nice and simple language that isn't very large. Both would require a new parser and/or installed compilers on the server-side. Compared to the disadvantages of other options, that seems like a pretty small loss for a great win.

Javascript is a widely understood and implemented language, with lots of efforts to get it even faster in modern browsers. Every Wikipedia user has a copy of it implemented in their browser and can start experimenting without the need for installing a compiler or a web server. Many people program in Javascript, so you have a huge potential number of people who could start programming Mediawiki templates. And it's already closely tied to the web, so you don't have to invent new ways of dealing with web-specific stuff.

So, let's choose Javascript as our new template programming language.

Regards, -- Hay

On Tue, Jun 30, 2009 at 6:16 PM, Brion Vibberbrion@wikimedia.org wrote:

...
As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.

An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.

There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid):

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

JavaScript

Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.

Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

-- brion

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Sergey Chernyshev

8:15 a.m.

I don't know about scripting languages for the templating, it might be an overkill.

When I was picking lower language for MediaWiki Widgets extension, I looked at popular PHP templating systems and ended up picking Smarty ( http://smarty.net/) - it can be security locked, it has a few useful features.

You can see Widget code here: http://www.mediawikiwidgets.org/w/index.php?title=Widget:Google_Calendar&... widget is called using a parser function like this: {{widget: Name|param=val|param2=val2}}.

Double curlys are far from perfect, but there are not that many good alternatives - XML is probably the only good alternative because it's universal and use by many-many tools out there. Can't say that I'm an expert in templating languages though, especially when we're talking about power-users and not developers.

Thank you,

Sergey

-- Sergey Chernyshev http://www.sergeychernyshev.com/

On Tue, Jun 30, 2009 at 12:16 PM, Brion Vibber brion@wikimedia.org wrote:

...

As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.

An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.

There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid):

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

JavaScript

Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.

Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

-- brion

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Aryeh Gregor

9:16 a.m.

On Tue, Jun 30, 2009 at 10:45 PM, Sergey Chernyshevsergey.chernyshev@gmail.com wrote:

...

I don't know about scripting languages for the templating, it might be an overkill.

People are using ParserFunctions as a scripting language already. That's not feasibly going to be removed at this point. So the only way to go is to replace it with a better scripting language, which is what we're talking about.

Sergey Chernyshev

3 Jul 3 Jul

10:57 p.m.

I think you're confusing simple logic of ParserFunctions in the template with a full scripting language like PHP.

That's why I proposed to look at something simplified like Smarty or alike.

Thank you,

Sergey

-- Sergey Chernyshev http://www.sergeychernyshev.com/

On Tue, Jun 30, 2009 at 11:46 PM, Aryeh Gregor < Simetrical+wikilist@gmail.com Simetrical%2Bwikilist@gmail.com> wrote:

...

On Tue, Jun 30, 2009 at 10:45 PM, Sergey Chernyshevsergey.chernyshev@gmail.com wrote:

...
I don't know about scripting languages for the templating, it might be an overkill.

People are using ParserFunctions as a scripting language already. That's not feasibly going to be removed at this point. So the only way to go is to replace it with a better scripting language, which is what we're talking about.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Aryeh Gregor

4 Jul 4 Jul

12:45 a.m.

On Fri, Jul 3, 2009 at 1:27 PM, Sergey Chernyshevsergey.chernyshev@gmail.com wrote:

...

I think you're confusing simple logic of ParserFunctions in the template with a full scripting language like PHP.

In what way is the logic of ParserFunctions "simple"? If you ignore the limitations on parse length, it's Turing-complete.

...

That's why I proposed to look at something simplified like Smarty or alike.

Hmm. Smarty looks interesting, at a quick glance. I suspect it's not designed to be secure against DoS, so it would need some kind of sandboxing. Hopefully less than some of the other solutions we're contemplating, though! I'd think it might serve okay, if we wrote enough custom functions to replace the existing ParserFunctions. I'm not sure.

Dmitriy Sintsov

12:07 p.m.

* Aryeh Gregor Simetrical+wikilist@gmail.com [Fri, 3 Jul 2009 15:15:48 -0400]:

...

Hmm. Smarty looks interesting, at a quick glance. I suspect it's not designed to be secure against DoS, so it would need some kind of sandboxing. Hopefully less than some of the other solutions we're contemplating, though! I'd think it might serve okay, if we wrote enough custom functions to replace the existing ParserFunctions. I'm not sure.

http://en.wikipedia.org/wiki/Template_engine_%28web%29#Comparison

There are some template engines implemented in multiple languages.

This one (from the list above) combines DOM-like manipulations via PHP and Javascript: http://code.google.com/p/querytemplates/ Something more closer to XSLT, yet simplier and has loops. Dmitriy

Tim Starling

1 Jul 1 Jul

9:16 a.m.

Brion Vibber wrote:

...

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.

An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.

There are problems with all the shell-based solutions. MediaWiki callbacks, like template expansion, {{VARIABLES}} and ifexist, are commonly used in templates on Wikipedia, and a scripting language without these would suffer from poor community buy-in. You could implement them from the shell using IPC, but IPC in PHP is rather cumbersome. The interface between the parser and the scripting engine would be performance-sensitive, because users would write templates that invoked the scripting engine hundreds of times in the course of rendering an article. So there's a case there for a persistent scripting engine with a command-based interface over a pipe.

The reason I like Lua is because of the potential to embed it in PHP as an extension, with fast setup and fast callbacks to MediaWiki. It does all its memory allocation via a callback to the application, including VM stack space, which means that it's possible to control the memory usage without killing the process when the limit is exceeded. But its standard library is unsuitable for running untrusted scripts, since it contains all the usual process control and file read/write functions.

The current PECL extension doesn't have any of the features that make Lua attractive: it does not have support for callbacks to PHP, or for replacing the standard library with something more sensible, or for limiting memory without killing the request when the limit is exceeded. Obviously the distributed standalone does not have these features either.

I had imagined the task of embedding Lua in MediaWiki as being primarily a C project, writing the necessary glue code between the embedded interpreter and PHP. I had hoped that banging the drum for Lua might encourage someone to look at these issues and start work on that project.

...

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

PHP can be secured against arbitrary execution using token_get_all(), there's a proof-of-principle validator of this kind in the master switch script project. But there are problems with attempting a single-process PHP-in-PHP sandbox:

* The poor support for signals in PHP makes it difficult to limit the execution time of a script snippet. Ticks only occur at the end of each statement, so you can defeat them by making a single statement that runs forever.

* Apart from blacklisting function definition, there is no way to protect against infinite recursion, which exhausts the process stack and causes a segfault.

* Memory limits are implemented on a per-request basis, and there's no way to recover from exceeding the memory limit, the request is just killed.

...

JavaScript

Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.

Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

SpiderMonkey and Python both lack control over memory usage. Python lacks a sandbox mode, the rexec module has been removed. SpiderMonkey isn't embedded in any useful kind of standalone, so you'd have to start with a C development project, like you would for Lua.

I think Rhino would be an easier path to JavaScript execution than SpiderMonkey. You can pass an -Xmx option to the java VM, and it'll throw an OutOfMemory exception when it hits that limit, allowing you to implement per-snippet memory limits without killing the interpreter. You could do wall-clock time limits using java.util.Timer, or CPU time limits using a JNI hack to poll clock(). You could turn off LiveConnect by making your own ClassShutter, leaving what (on initial impressions) is a reasonably secure sandbox. You'd still need an interface between Java and PHP, but presumably that's a well-studied problem.

Running scripts in the Java VM has the advantage that you don't have to rely on the security of the collection of amateurish C code that is PHP. Remember those PCRE crash bugs that went unfixed for years, before someone finally demonstrated elevation to arbitrary execution? At a conference, I overheard Rasmus Lerdorf quip that really PHP is pretty secure, since most of the demonstrated buffer/integer/heap overflows needed arbitrary script access to exploit, and if the attacker has that then you're screwed anyway.

-- Tim Starling

Gregory Maxwell

10:16 a.m.

On Tue, Jun 30, 2009 at 11:46 PM, Tim Starlingtstarling@wikimedia.org wrote: [snip]

...

SpiderMonkey and Python both lack control over memory usage. Python lacks a sandbox mode, the rexec module has been removed. SpiderMonkey isn't embedded in any useful kind of standalone, so you'd have to start with a C development project, like you would for Lua.

Cpython has about a billion ways to inject machine code, this is one reason why Rpython failed. If you were to do python it would probably need to be embedded in java.

For spidermonkey the model I would have envisioned is a separate script executor daemon which spawns thread-per-script (with limits to keep the peak thread count reasonable) and arbitrates communication with mediawiki over sockets. Memory limits then become a simple exercise in providing an instrumented malloc and setting the thread stack size appropriately.

This model has the advantage for big installations that script processing can be compartmentalized and run only on certan systems or only on certain cores. It would also allow the scripting process to be more highly compartmentalized than PHP is, since its would only need to be able to SBRK and read/write some sockets. (i.e http://en.wikipedia.org/wiki/Seccomp )

Another reason why using a narrow pipe interface is that it would be possible to distinguish scripts which are a proper function on their inputs from ones that aren't, and a narrow pipe interface makes it easier to enforce those limits:

For example, there could be three script modes: Function Function+Date Not-function

Functions are guaranteed to produce constant output for their input, and their input can't include anything which is more volatile than page editing. (i.e. no time/date as an input, no time/pid triggered rand(), no retrieving data from logs or other pages). The output from these could be trivially cached based on a hash of the input arguments.

Function+date is like the above, but they also have access to the current date (but not time). These could be cached but the cache would be invalidated every day. This could be generalized further where the script prototype could specify the available inputs. (i.e. is this a function on page specific data, or is this just some formatting template which works universally?)

Not-function means without those limits.

The different types of script could have resource limits, execution priorities, and site policy controls. For example, wikimedia might only allow function, function+revision_info for performance reasons.

Brion Vibber

2 Jul 2 Jul

10:57 p.m.

Tim Starling wrote:

...

I think Rhino would be an easier path to JavaScript execution than SpiderMonkey. You can pass an -Xmx option to the java VM, and it'll throw an OutOfMemory exception when it hits that limit, allowing you to implement per-snippet memory limits without killing the interpreter. You could do wall-clock time limits using java.util.Timer, or CPU time limits using a JNI hack to poll clock(). You could turn off LiveConnect by making your own ClassShutter, leaving what (on initial impressions) is a reasonably secure sandbox.

Freebase is apparently doing their server-side JS work with Rhino and have actually modified their JVM to handle some of the resource limiting.

...

Running scripts in the Java VM has the advantage that you don't have to rely on the security of the collection of amateurish C code that is PHP. Remember those PCRE crash bugs that went unfixed for years, before someone finally demonstrated elevation to arbitrary execution?

*shudder*

-- brion

Platonides

3 Jul 3 Jul

10:30 p.m.

Tim Starling wrote:

...

...

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

PHP can be secured against arbitrary execution using token_get_all(), there's a proof-of-principle validator of this kind in the master switch script project. But there are problems with attempting a single-process PHP-in-PHP sandbox:

The poor support for signals in PHP makes it difficult to limit the

execution time of a script snippet. Ticks only occur at the end of each statement, so you can defeat them by making a single statement that runs forever.

Inject a check_limits() call into each looping structure. If it detects the script has been running for more than $maxTime, timeout it. Can you defeat that?

...

Apart from blacklisting function definition, there is no way to

protect against infinite recursion, which exhausts the process stack and causes a segfault.

Also inject the same call into functions.

...

Memory limits are implemented on a per-request basis, and there's no

way to recover from exceeding the memory limit, the request is just killed.

Call memory_get_usage() before and also inside check_limits() to check script abides inside memory limits. Abort if it gets near php memory limit (I'd expect the script's memory to be much lower than php's). However, that check is much easier to bypass.

Gregory Maxwell

1 Jul 1 Jul

9:56 a.m.

On Tue, Jun 30, 2009 at 12:16 PM, Brion Vibberbrion@wikimedia.org wrote:

...

As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

[snip]

So— Any thoughts on how you address the universal problem of the DOS attack script?

I.e. myscript: do { some_expensive_operation(); /* Presumably there will be hooks to pull text from other revisions */ } while (1);

and in [[Template:Widely used]] {{myscript}}

I'm of the impression that simply setting a limits on CPU and memory isn't sufficient to address this, because the reasonable limit will be high enough to be dangerous when the object is added to 100k pages, while a limit low enough to be safe everywhere will be far too constraining and likely to fail at random depending on overall system load.

...

Disadvantage: Like PHP, Python is difficult to lock down securely.

I don't know that difficult is really the right description here. People willing to spend far more effort on this than you probably are have tried to sandbox python and failed. I don't believe there is any real production grade support for the level of lockdown required for either PHP or Python. And I'd worry that any PHP implementations of the sandboxed languages might lose the battle tested sandboxing.

It's acceptable for mediawiki to fall back to lower performing alternatives when c modules can't be used, but I doubt its acceptable to fall back to less secure ones!

Is execution in enviroments where c modules are not possible actually a hard requirement? If it is I think this is a non-starter.

Aryeh Gregor

2 Jul 2 Jul

12:45 a.m.

On Wed, Jul 1, 2009 at 12:26 AM, Gregory Maxwell wrote: > Is execution in enviroments where c modules are not possible actually > a hard requirement? Even exec() apparently is no good, let alone C modules. > If it is I think this is a non-starter. Seems so. []

Brion Vibber

11:02 p.m.

Gregory Maxwell wrote:

...

So— Any thoughts on how you address the universal problem of the DOS attack script?

[snip]

...

I'm of the impression that simply setting a limits on CPU and memory isn't sufficient to address this, because the reasonable limit will be high enough to be dangerous when the object is added to 100k pages, while a limit low enough to be safe everywhere will be far too constraining and likely to fail at random depending on overall system load.

It's never an easy problem. :)

But there are some interesting potential things to poke at, such as having per-template limits, per-cluster limits, etc -- we could in theory shut down some template rendering while still spitting out the rest of a page on a timely basis.

...

...
Disadvantage: Like PHP, Python is difficult to lock down securely.

I don't know that difficult is really the right description here. People willing to spend far more effort on this than you probably are have tried to sandbox python and failed. I don't believe there is any real production grade support for the level of lockdown required for either PHP or Python. And I'd worry that any PHP implementations of the sandboxed languages might lose the battle tested sandboxing.

It's acceptable for mediawiki to fall back to lower performing alternatives when c modules can't be used, but I doubt its acceptable to fall back to less secure ones!

Indeed. :)

...

Is execution in enviroments where c modules are not possible actually a hard requirement? If it is I think this is a non-starter.

Since requiring custom PHP modules would pretty much rule out all casual third-party use of MediaWiki, that would definitely be a hard requirement to not require it. :)

I'd really _like_ to be able to avoid having to require external executables either, if it can be managed, but that's harder since it means having a pure PHP implementation of the scripting language. (ouch!)

-- brion

Dmitriy Sintsov

11:18 p.m.

...

I'd really _like_ to be able to avoid having to require external executables either, if it can be managed, but that's harder since it means having a pure PHP implementation of the scripting language. (ouch!)

Maybe translating only a subset of JS or Lua to PHP. The engine itself is written in PHP, anyway. Moving to C/Java modules would dramatically reduce the popularity of engine. For example, right now I am having difficulties compiling ffmpeg at old FreeBSD host. I imagine custom php module can have similar difficulties. Dmitriy

Victor Vasiliev

3 Jul 3 Jul

12:33 a.m.

Brion Vibber wrote:

...

I'd really _like_ to be able to avoid having to require external executables either, if it can be managed, but that's harder since it means having a pure PHP implementation of the scripting language. (ouch!)

-- brion

I've rewritten abuse filter parser so its scripts (language differs though) may now be embedded in wikitext. The extension is called InlineScripts (sorry, haven't invented any better name) and it's working, although many functions are not implemented and test suite is missing.

Also, there's a problem for all such embedded languages proposals: we don't have an appropriate parser hook type. * Function hooks (like {{#if}}) will have their code preprocessed (that's undesirable) * Tag hooks don't have access to PPFrame, and therefore they don't have access to template arguments, they are not expanded by Special:ExpandTemplates etc.

--vvv

Bryan Tong Minh

3:30 p.m.

On Thu, Jul 2, 2009 at 7:32 PM, Brion Vibberbrion@wikimedia.org wrote:

...

I'd really _like_ to be able to avoid having to require external executables either, if it can be managed, but that's harder since it means having a pure PHP implementation of the scripting language. (ouch!)

-- brion

We could always have a default implementation in PHP, and optionally provide the same functionality but faster in a C module.

Bryan

Andrew Garrett

5:33 p.m.

On 03/07/2009, at 11:00 AM, Bryan Tong Minh wrote:

...

On Thu, Jul 2, 2009 at 7:32 PM, Brion Vibberbrion@wikimedia.org wrote:

...
I'd really _like_ to be able to avoid having to require external executables either, if it can be managed, but that's harder since it means having a pure PHP implementation of the scripting language. (ouch!)

-- brion

We could always have a default implementation in PHP, and optionally provide the same functionality but faster in a C module.

Writing an interpreter for a language is not trivial. Writing it in C, and then porting to PHP is even worse.

Even if you could find somebody to write it, it would have to be reviewed, as well.

-- Andrew Garrett Contract Developer, Wikimedia Foundation agarrett@wikimedia.org http://werdn.us

Dmitriy Sintsov

8:26 p.m.

* Andrew Garrett agarrett@wikimedia.org [Fri, 3 Jul 2009 13:03:03 +0100]:

...

Writing an interpreter for a language is not trivial. Writing it in C, and then porting to PHP is even worse.

Many of languages "resemble" syntax of C language (curly braces, plusplus/minusminus and so on), but with "lousy" typing (numerical strings transparently mixed with numbers, floats "mixed" with integers). Add a dollar sign prefix to variable names to JS code and lots of _simple_ JS code would be really similar to PHP. I wonder whether that helps to translate? BTW, one of LOGO interpreter distributions (an educational functional language) has a partial (incomplete but working) interpreter of Pascal language just in about 20-30KB! It's amazing! it seems that writing interpreter with another interpreter is much easier than by using low-level language like C. I am not sure whether Logo is suitable, though - I am not expert in translation in any way. Dmitriy

Jared Williams

1 Jul 1 Jul

4:47 p.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Brion Vibber Sent: 30 June 2009 17:17 To: Wikimedia developers Subject: [Wikitech-l] On templates and programming languages

As many folks have noted, our current templating system works ok for simple things, but doesn't scale well -- even moderately complex conditionals or text-munging will quickly turn your template source into what appears to be line noise.

And we all thought Perl was bad! ;)

There's been talk of Lua as an embedded templating language for a while, and there's even an extension implementation.

One advantage of Lua over other languages is that its implementation is optimized for use as an embedded language, and it looks kind of pretty.

An _inherent_ disadvantage is that it's a fairly rarely-used language, so still requires special learning on potential template programmers' part.

An _implementation_ disadvantage is that it currently is dependent on an external Lua binary installation -- something that probably won't be present on third-party installs, meaning Lua templates couldn't be easily copied to non-Wikimedia wikis.

There are perhaps three primary alternative contenders that don't involve making up our own scripting language (something I'd dearly like to avoid):

PHP

Advantage: Lots of webbish people have some experience with PHP or can easily find references.

Advantage: we're pretty much guaranteed to have a PHP interpreter available. :)

Disadvantage: PHP is difficult to lock down for secure execution.

JavaScript

Advantage: Even more folks have been exposed to JavaScript programming, including Wikipedia power-users.

Disadvantage: Server-side interpreter not guaranteed to be present. Like Lua, would either restrict our portability or would require an interpreter reimplementation. :P

Python

Advantage: A Python interpreter will be present on most web servers, though not necessarily all. (Windows-based servers especially.)

Wash: Python is probably better known than Lua, but not as well as PHP or JS.

Disadvantage: Like PHP, Python is difficult to lock down securely.

Any thoughts? Does anybody happen to have a PHP implementation of a Lua or JavaScript interpreter? ;)

Would you want the interpreter to translate the template into PHP array of opcodes first, so could dump that into APC/MemCache?

Jared

5617

Age (days ago)

5626

Last active (days ago)

wikitech-l@lists.wikimedia.org

115 comments

29 participants

tags (0)

participants (29)

Alex
Amir E. Aharoni
Andrew Garrett
Aryeh Gregor
Brian
Brion Vibber
Bryan Tong Minh
Chad
Daniel Schwen
Dmitriy Sintsov
Gregory Maxwell
Hay (Husky)
Jared Williams
Marco Schuster
Michael Daly
Petr Kadlec
Platonides
randomcoder1
Robert Rohde
Sergey Chernyshev
Steve Bennett
Steve Sanbeg
Tei
Thomas Dalton
Tim Landscheidt
Tim Starling
Trevor Parscal
Victor Vasiliev
William Allen Simpson