Re: [Wikitech-l] JS2 design (was Re: Working towards branching MediaWiki 1.16)

25 Sep 2009

      ...
...

The namespacing in Google's jsapi is very nice, with everything

being a member of a global "google" object. We would do well to
emulate it, but migrating all JS to such a scheme is beyond the scope
of the current project.
You somewhat contradict this approach by recommending against "class" 
abstraction below.. ie how will you cleanly load components and 
dependencies if not by a given name?
By module name. Each module can contain multiple files. I don't see
any problem with allowing anonymous modules, as long as the caller is
happy with the fact that such modules can't be used in dependencies or
loaded on demand on the client side.
...
I agree we should move things into a global object ie: $j and all our 
components / features should extend that object. (like jquery plugins). 
That is the direction we are already going.
I think it would be better if jQuery was called window.jQuery and
MediaWiki was called window.mw. Then we could share the jQuery
instance with JS code that's not aware of MediaWiki, and we wouldn't
need to worry about namespace conflicts between third-party jQuery
plugins and MediaWiki.
...
Dependency loading is not really beyond the scope... we are already 
supporting that. If you check out the mv_jqueryBindings function in 
mv_embed.js ... here we have loader calls integrated into the jquery 
binding. This integrates loading the high level application interfaces 
into their interface call.
Your so-called dependency functions (e.g. doLoadDepMode) just seemed
to be a batch load feature, there was no actual dependency handling.
Every caller was required to list the dependencies for the classes it
was loading.
...
The idea is to move more and more of the structure of the application 
into that system. so right now mwLoad is a global function but should be 
re-factored into the jquery space and be called via $j.load();  |
|
That would work well until jQuery introduced its own script-loader
plugin with the same name and some extension needed to use it.
[...]
...
We could add that convention directly into the script-loader function if 
desired so that on a per class level we include dependencies. Like 
mwLoad('ui.dialog') would know to load ui.core etc.
Yes, that is what real dependency handling would do.
...
...

The "class" abstraction as implemented in JS2 has very little value

to PHP callers. It's just as easy to use filenames.
The idea with "class" abstraction is that you don't know what script set 
you have available at any given time. Maybe one script included 
ui.resizable and ui.move and now your script depends on  ui.resizable 
and ui.move and ui.drag... your loader call will only include ui.drag 
(since the other are already defined).
I think you're missing the point. I'm saying it doesn't provide enough
features. I want to add more, not take away some.
You can remove duplicates by filename.
[...]
...
We want to move away from php code dependencies for each javascript 
module. Javascript should just directly hit a single exposure point of 
the mediawiki api. If we have php code generating bits and pieces of 
javascript everywhere it quickly gets complicated, is difficult to 
maintain, much more resource intensive, and requires a whole new 
framework to work right.
Php's integration with the javascript should be minimal. php should 
supply configuration, and package in localized msgs.
I don't think it will be too complicated or resource intensive. JS
generation in PHP is very flexible and you admit that there is a role
for it. I don't think there's a problem with adding a few more
features on the PHP side.
If necessary, we can split it back out to a non-MediaWiki standalone
mode by generating some static JS.
What is your reason for saying this? Have you worked on some other
framework where integration of PHP and JavaScript has caused problems?
...
...

Central registration of all client-side resources in a global

variable would be onerous and should be avoided.
You can always add to the registered global. This works well by having 
the php read the javascript file directly to ascertain the global list. 
That way your javascript works stand alone as well as integrated with a 
script-loader that provides localization and configuration.
There's a significant CPU cost to loading and parsing JS files on
every PHP request. I want to remove that behaviour. Instead, we can
list client-side files in PHP. Then from the PHP list, we can generate
static JS files in order to recover the standalone functionality.
[...]
...
That sounds cleaner than the present outputPage and Skin.php and 
associated script-loader grafting. Having a cleaner system would be 
nice... but will probably break skins and other stuff... or have 
OutputPage and Skin old api mappings or change almost every extension 
and break every 3rd party skin out there?
I think I'll probably break most third-party skins, if they have PHP
code. We break them with just about every major release so there won't
be much surprise there.
On this point, I think we need:
* Easier management of non-PHP skins (i.e. CSS and images only)
* Automated CSS generation (per original post)
* Easier ways to modify the document structure, with less PHP
involved. XSLT?
* An interface in PHP that we can live with, so we don't feel obliged
to keep breaking it.
I should be able to retain compatibility with non-skin extensions, and
I won't break interfaces unnecessarily. But we're committed to an
incremental development process, rather than a sequence of rewrites,
and that means that some interfaces will get old and die within the
1.X.0 sequence.
[...]
...
...

Like Wordpress, we could store minified and concatenated files in a

public cache and then link to that cache directly in the HTML.
That seems perfectly reasonable... Is the idea that this will help small 
sites that don't have things behind a squid proxy?
Yes, and it also benefits Wikimedia.
...
Although small sites 
seem to work oky with mediaWiki pages being served via php reading 
cached files.
Have you looked at the profiling? On the Wikimedia app servers, even
the simplest MW request takes 23ms, and gen=js takes 46ms. A static
file like wikibits.js takes around 0.5ms. And that's with APC. You say
MW on small sites is OK, I think it's slow and resource-intensive.
That's not to say I'm sold on the idea of a static file cache, it
brings its own problems, which I listed.
...
...

The cache invalidation scheme is tricky, there's not really an ideal

system. A combination of cache-breaking parameters (like Michael's
design) and short expiry times is probably the way to go. Using
cache-breaking parameters alone doesn't work because there is
referring HTML cached on both the server and client side, and
regenerating that HTML periodically would be much more expensive than
regenerating the scripts.
An option is to write out a bit of dynamic javascript to a single low 
expire static cached core script that sets the versions for everything 
that could be included. But that does not work well with live hacks. 
(hence the checking of filemodified date) ... If version updates are 
generally highly correlated with localization updates anyway... I don't 
see too much problem with old javascript persisting until a page is 
purged and rendered with the new interface.
I don't see benefit in hurting our cache rate to support ~new 
javascript~ with ~old html~
The performance impact of refreshing a common file once every hour or
two is not large. Your code sets the expiry time to a year, and
changes the urid parameter regularly, which sounds great until you
accidentally cache some buggy JS into squid and you have no way to
reconstruct the URID parameters and thus purge the object. Then you'd
be stuck with the choice of either waiting a month for all the
referring HTML to expire, or clearing the entire squid cache.
If there's a need for the versions of the HTML and JS to match, that
should be handled rigorously, with old versions retained at the origin
server, instead of relying on squid to keep a record of every object
it's served.
[...]
...
...

Unsafe construction of HTML. This is ubiquitous in the mwEmbed

directory and there will be a huge potential for XSS, as soon as user
input is added. HTML construction with innerHTML can be replaced by
document.createElement() or its jQuery equivalent.
I build a lot of html as static strings because its faster than 
generating every element with function calls. If you can inject 
arbitrary content into some javscript string then I imagine you can do 
so with the createElement as well. You don't gain much escaping already 
defined javascript. If you do something to get some value into some one 
elses JavaScript instance then you might as well call your evilJs 
directly. Perhaps I am understanding this wrong? Could you illustrate 
how that would be exploited in one case but not the other?
Say if MediaWiki emits an input box with a properly escaped attribute
derived from user input
<input type="text" id="filename"
  value="&lt;iframe src=&quot;http://example.com/&quot;/&gt;"/>
Then consider JS code such as:
dialog.innerHTML =
  "<div>" +
  document.getElementById( 'filename' ).value +
  "</div>";
This unescapes the value attribute, and puts the contents straight
into HTML. The iframe will be created. This is a security vulnerability.
The alternative style used by jQuery UI is:
$j('<div/>')
  .text( $j('#filename')[0].value) )
  .appendTo(dialog);
Or equivalently in plain DOM:
var div = document.createElement( 'div' );
div.appendChild( document.createTextNode(
    document.getElementById( 'filename' ).value ) );
dialog.appendChild( div );
The single text node contains the same literal text that was in the
input box, no iframe element is created. You could think of it as
implicit escaping, since if you ask for HTML back:
alert( dialog.innerHTML )
The browser will show you properly escaped HTML:
&lt;div&gt;&lt;iframe src"http://example.com/%22/%3E%3C/div%3E;
In OggHandler, I found that it was necessary to use innerHTML in some
cases, because there were bugs involved with creating a Java applet
and then changing its attributes. But I made sure that all the HTML I
created was properly escaped, so that there was no possibility of
arbitrary HTML being created, either from trusted or untrusted input.
It's best to escape even trusted input, for two reasons:
* Correctness. Even trusted input can contain quotation marks.
* Ease of review. Reviewers should not have to determine which of your
inputs are trusted and which are untrusted in order to verify the
safety of the code.
There's more on ease of review and other security issues in my article
on the subject:
http://www.mediawiki.org/wiki/Security_for_developers
Security takes precedence over performance. There are better ways to
improve performance than to open up your code to systematic exploit by
malicious parties.
-- Tim Starling

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] JS2 design (was Re: Working towards branching MediaWiki 1.16)