Future of Javascript and mediaWiki

List overview All Threads
Download

newer

older

Code update status?

image backup and info on wikitech...

Michael Dale

13 Dec 2008 13 Dec '08

1:43 a.m.

Looking to the future of mediaWiki the base set of javascipt will continue to grow as the client side applications grow in complexity to address improved usability issues and new features. To this end it will become more necessary to A) have a better system for sending out client side javascript. and B) standardize around a JavaScript helper library.

A) The improved delivery mechanism is a two part issue: Code maintainability and client side performance. 1) To maintain and modularize code as the complexity of javsaciprt libraries grows, it makes a lot of sense to split JavaScript class & objects into respective files folders etc. Likewise we dont' want additional requests for language code delivery. By using a server side delivery system we can do clean dynamic addition of sets of javascript files to the page in a single request "just in time" as the user interacts with a given set of interface components. If we don't update our javascript delivery mechanisms this will result in _lots_ of little javascipt requests and less maintainable/flexible javascript code.

2) Furthermore with complex javascript libraries we want to add verbose comments, documentation, and debugging statements to the code without resulting in reduced client side performace in delays due to file size increases. Minimized javascript can strip all that unnecessary bits.

I propose we implement or adopt something like: http://code.google.com/p/minify/ This will mean sets of javascript files can be grabbed in a single request, minimized, grouped, cached, and gziped (if the clients supports it). This should work fine with our reverse proxy setup resulting in a net decrease in cluster load by dealing with smaller files for the majority of the time. A user preference could request un-compressed individual files and or a url parameter like ?jsdebug=true could enable non-compressed output for debugging.

A library such as minify can also minimize and group all the style sheets and minimize html output if we wanted. Although the gains are not anywhere as dramatic or as necessary for the html/css space.

If we can get some community consensus about this direction that would be good. I will start looking at integrating the above mentioned library, run some tests etc.

B) We should also address the convergence on a javascript library for HTML document traversing, event handling, interface improvements, maintainability, flexibility etc. All the sequencer, metavid stuff uses jQuery. jQuery is GPL/MIT licensed javascript library emerging as the "winner" in script libraires with very wide adoption (google, apple.com, digg.com, mozilla.com etc) and very small footprint. Refactoring existing mediaWiki javsacript code as jQuery javascript would result in much fewer cross browser hacks in mediaWiki javasctipt and generally shorter, more maintainable code. So seems like a good direction to me ;)

peace, --michael

Show replies by date

Gregory Maxwell

13 Dec 13 Dec

3:29 a.m.

On Fri, Dec 12, 2008 at 1:43 PM, Michael Dale mdale@wikimedia.org wrote:

...

I propose we implement or adopt something like: http://code.google.com/p/minify/ This will mean sets of javascript files can be grabbed in a single

[snip]

Search the list for minify, this was discussed a couple weeks ago.

Michael Dale

6:02 a.m.

I am aware of the Yslow thread, is that what your referring to? .. it seemed to end with not much resolution or people mentioning its not much of a problem given the current javscript library set... and it seemed to assume that we would lose debug functionality...

Just wanted to mention it again since I think it will be a problem shortly and that the solution should be thought about.

For example it would be nice if all javascript libraries defined in a common way the set of messages they use so that user-language message text could parse that and package the localized language messages into the request. That way you don't have to issue separate ajax requests for language messages and you only send out the text messages for the associative interface components of the JavaScript libraries that your interacting with.

anyway as mentioned earlier I will try to experiment with a solution soonish. (if no one is working on this already)

peace, michael

Gregory Maxwell wrote:

...

On Fri, Dec 12, 2008 at 1:43 PM, Michael Dale mdale@wikimedia.org wrote:

...
I propose we implement or adopt something like: http://code.google.com/p/minify/ This will mean sets of javascript files can be grabbed in a single

[snip]

Search the list for minify, this was discussed a couple weeks ago.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Gregory Maxwell

6:19 a.m.

On Fri, Dec 12, 2008 at 6:02 PM, Michael Dale dale@ucsc.edu wrote:

...

I am aware of the Yslow thread, is that what your referring to? .. it seemed to end with not much resolution or people mentioning its not much of a problem given the current javscript library set... and it seemed to assume that we would lose debug functionality...

Just wanted to mention it again since I think it will be a problem shortly and that the solution should be thought about.

For example it would be nice if all javascript libraries defined in a common way the set of messages they use so that user-language message text could parse that and package the localized language messages into the request. That way you don't have to issue separate ajax requests for language messages and you only send out the text messages for the associative interface components of the JavaScript libraries that your interacting with.

anyway as mentioned earlier I will try to experiment with a solution soonish. (if no one is working on this already)

Well: combining+gzipping alone is a really significant chunk of the total possible improvement and does no grave harm to debugability. Mostly improvement can come from minimising server round trips the size of the data transmitted isn't important except in so far as smaller sizes can reduce round trips to an extent.

Also, a less aggressive line number preserving minimization would be less harmful for debugability.

The prospect of a magic jsdebug option was raised in the prior thread, but I believe it was shot at because it doesn't help for rare problems and less-technical bug victims, people with stale cached copies of the JS (that flipping on debugging would avoid), etc.

Making ajax requests for message text would be pretty miserable performance wise.

Michael Dale

6:37 a.m.

The debug switch would modify the HTML output to point at the individual files with a GET seed ie myscript.js?<?php= date()?> or something of that nature bypassing the script loader altogether. The bulk of extra content is comments, code documentation and debug statements .. line preservation does not seem worth it. Debug output should be enabled via a GET debug argument, user preference or $wgConfigure variable.

--michael

Gregory Maxwell wrote:

...

On Fri, Dec 12, 2008 at 6:02 PM, Michael Dale dale@ucsc.edu wrote:

...
I am aware of the Yslow thread, is that what your referring to? .. it seemed to end with not much resolution or people mentioning its not much of a problem given the current javscript library set... and it seemed to assume that we would lose debug functionality...

Just wanted to mention it again since I think it will be a problem shortly and that the solution should be thought about.

For example it would be nice if all javascript libraries defined in a common way the set of messages they use so that user-language message text could parse that and package the localized language messages into the request. That way you don't have to issue separate ajax requests for language messages and you only send out the text messages for the associative interface components of the JavaScript libraries that your interacting with.

anyway as mentioned earlier I will try to experiment with a solution soonish. (if no one is working on this already)

Well: combining+gzipping alone is a really significant chunk of the total possible improvement and does no grave harm to debugability. Mostly improvement can come from minimising server round trips the size of the data transmitted isn't important except in so far as smaller sizes can reduce round trips to an extent.

Also, a less aggressive line number preserving minimization would be less harmful for debugability.

The prospect of a magic jsdebug option was raised in the prior thread, but I believe it was shot at because it doesn't help for rare problems and less-technical bug victims, people with stale cached copies of the JS (that flipping on debugging would avoid), etc.

Making ajax requests for message text would be pretty miserable performance wise.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Gregory Maxwell

6:44 a.m.

On Fri, Dec 12, 2008 at 6:37 PM, Michael Dale mdale@wikimedia.org wrote:

...

The debug switch would modify the HTML output to point at the individual files with a GET seed ie myscript.js?<?php= date()?> or something of that nature bypassing the script loader altogether. The bulk of extra content is comments, code documentation and debug statements .. line preservation does not seem worth it. Debug output should be enabled via a GET debug argument, user preference or $wgConfigure variable.

So you end up with a user who says the problem goes away when they enable debug. Yet they can't provide useful debugging info with the failing version because it's all garbled minification output.

May I suggest an alternative perspective with respect to line numbering: Destroying line numbers doesn't reduce the post-gzipped size by much, it does not seem worth it.

Michael Dale

6:58 a.m.

that just means the minification is broken on their platform no? So we have to debug the minification on their platform not the debug output ... but sure as you mention a hundreds of single character whitespace lines will compress nicely.

--michael

Gregory Maxwell wrote:

...

On Fri, Dec 12, 2008 at 6:37 PM, Michael Dale mdale@wikimedia.org wrote:

...
The debug switch would modify the HTML output to point at the individual files with a GET seed ie myscript.js?<?php= date()?> or something of that nature bypassing the script loader altogether. The bulk of extra content is comments, code documentation and debug statements .. line preservation does not seem worth it. Debug output should be enabled via a GET debug argument, user preference or $wgConfigure variable.

So you end up with a user who says the problem goes away when they enable debug. Yet they can't provide useful debugging info with the failing version because it's all garbled minification output.

May I suggest an alternative perspective with respect to line numbering: Destroying line numbers doesn't reduce the post-gzipped size by much, it does not seem worth it.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Gregory Maxwell

7:12 a.m.

On Fri, Dec 12, 2008 at 6:58 PM, Michael Dale mdale@wikimedia.org wrote:

...

that just means the minification is broken on their platform no? So we have to debug the minification on their platform not the debug output ... but sure as you mention a hundreds of single character whitespace lines will compress nicely.

No it could mean that they are getting a corrupted copy if the mainline JS though some broken cache between the backend and their browser, which is being avoided by fetching a separate debugging JS.

Michael Dale

7:31 a.m.

We would have the script_server be passed a unique variable per SVN version of the mediaWiki (like we currently do with the static js includes) If their browser treats a new url with different get parameters as the same as some older version resulting in a cache mismatch then that is bad...

Ideally we don't have broken transformations on our back end per our unique urls matched to the svn version of the file. Yes different output of js means different input for the client and yes different input could result in new unforeseen errors client side that don't manifest in the un-minimized code. We would of course want to do lots of testing with the minimized output.

--michael

Gregory Maxwell wrote:

...

On Fri, Dec 12, 2008 at 6:58 PM, Michael Dale mdale@wikimedia.org wrote:

...
that just means the minification is broken on their platform no? So we have to debug the minification on their platform not the debug output ... but sure as you mention a hundreds of single character whitespace lines will compress nicely.

No it could mean that they are getting a corrupted copy if the mainline JS though some broken cache between the backend and their browser, which is being avoided by fetching a separate debugging JS.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Aryeh Gregor

14 Dec 14 Dec

5:58 a.m.

On Fri, Dec 12, 2008 at 7:31 PM, Michael Dale mdale@wikimedia.org wrote:

...

We would have the script_server be passed a unique variable per SVN version of the mediaWiki (like we currently do with the static js includes) If their browser treats a new url with different get parameters as the same as some older version resulting in a cache mismatch then that is bad...

Ideally we don't have broken transformations on our back end per our unique urls matched to the svn version of the file. Yes different output of js means different input for the client and yes different input could result in new unforeseen errors client side that don't manifest in the un-minimized code. We would of course want to do lots of testing with the minimized output.

Or we could just do ordinary minification, but not touch newlines (including ones that are part of comments). The benefit for debugging seems to outweigh the very marginal incremental improvements to page load time.

Daniel Friesen

1:58 p.m.

Erm... so you're saying we should go and turn this:

if(fooBar) { var baz = false; var x = 0; while(!baz) { if(x == 260) { baz = true; } if(x == 25) { runThis({ a: 'b', c: 'd' }); } x++; } if(x > 55) andRunThis(fooBar); }

Into this:

if(fooBar){ var baz=false; var x=0; while(!baz){ if(x==260){ baz=true; } if(x==25){ runThis({ a:'b', c:'d' }); } x++; } if(x>55) andRunThis(fooBar); }

Minification killing debugging isn't just because of it destroying newlines. I honestly don't want to pop between places to debug things. Do remember that some of us like to debug within the browser. FireFox with FireBug for one very nicely lets me jump to where an error happened in the code and breakpoint it as well. Trying to do that with a mess like that even if you don't touch newlines is still troublesome to read. You can hardly tell where something starts and where something ends. But not only that, plenty of the optimization that comes from minification is because it completely removes comments. There's little point to minifying if you're only going to minify half way. You get the worst of both worlds. Hard to read code, and your filesize still hasn't changed much.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) ~Profile/Portfolio: http://nadir-seen-fire.com -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) -Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)

Aryeh Gregor wrote:

...

On Fri, Dec 12, 2008 at 7:31 PM, Michael Dale mdale@wikimedia.org wrote:

...
We would have the script_server be passed a unique variable per SVN version of the mediaWiki (like we currently do with the static js includes) If their browser treats a new url with different get parameters as the same as some older version resulting in a cache mismatch then that is bad...

Ideally we don't have broken transformations on our back end per our unique urls matched to the svn version of the file. Yes different output of js means different input for the client and yes different input could result in new unforeseen errors client side that don't manifest in the un-minimized code. We would of course want to do lots of testing with the minimized output.

Or we could just do ordinary minification, but not touch newlines (including ones that are part of comments). The benefit for debugging seems to outweigh the very marginal incremental improvements to page load time.

Aryeh Gregor

9:43 p.m.

On Sun, Dec 14, 2008 at 1:58 AM, Daniel Friesen dan_the_man@telus.net wrote:

...

Erm... so you're saying we should go and turn this:

...

Into this:

if(fooBar){ var baz=false; var x=0; while(!baz){ if(x==260){ baz=true; } if(x==25){ runThis({ a:'b', c:'d' }); } x++; } if(x>55) andRunThis(fooBar); }

Readable enough if you just need to figure out what the relevant line is doing, sure. You can then go to the actual wikibits.js or whatever, using the extra ?jsdebug=1 or whatever, and make sure it matches up if there's a problem.

...

Minification killing debugging isn't just because of it destroying newlines. I honestly don't want to pop between places to debug things. Do remember that some of us like to debug within the browser. FireFox with FireBug for one very nicely lets me jump to where an error happened in the code and breakpoint it as well.

Remember that this would only be necessary if the problem didn't also occur for the unminified version. If it did (which should be almost always), you can just use that.

...

But not only that, plenty of the optimization that comes from minification is because it completely removes comments. There's little point to minifying if you're only going to minify half way. You get the worst of both worlds. Hard to read code, and your filesize still hasn't changed much.

Sure it has. This comment:

/** * Add a link to one of the portlet menus on the page, including: * * p-cactions: Content actions (shown as tabs above the main content in Monobook) * p-personal: Personal tools (shown at the top right of the page in Monobook) * p-navigation: Navigation * p-tb: Toolbox * * This function exists for the convenience of custom JS authors. All * but the first three parameters are optional, though providing at * least an id and a tooltip is recommended. * * By default the new link will be added to the end of the list. To * add the link before a given existing item, pass the DOM node of * that item (easily obtained with document.getElementById()) as the * nextnode parameter; to add the link _after_ an existing item, pass * the node's nextSibling instead. * * @param String portlet -- id of the target portlet ("p-cactions", "p-personal", "p-navigation" or "p-tb") * @param String href -- link URL * @param String text -- link text (will be automatically lowercased by CSS for p-cactions in Monobook) * @param String id -- id of the new item, should be unique and preferably have the appropriate prefix ("ca-", "pt-", "n-" or "t-") * @param String tooltip -- text to show when hovering over the link, without accesskey suffix * @param String accesskey -- accesskey to activate this link (one character, try to avoid conflicts) * @param Node nextnode -- the DOM node before which the new item should be added, should be another item in the same list * * @return Node -- the DOM node of the new item (an LI element) or null */

gets turned into 28 bytes or so. And the large runs of newlines will undoubtedly compress very well, too. I doubt there's a substantial difference in the size depending on whether you include newlines or not. But benchmarking's the only way to test, right? The evidence that minification helps for large amounts of JS are fairly unequivocal (see, e.g., Steve Souders' "High Performance Websites").

Roan Kattouw

15 Dec 15 Dec

2:24 a.m.

Aryeh Gregor schreef:

...

I doubt there's a substantial difference in the size depending on whether you include newlines or not. But benchmarking's the only way to test, right? The evidence that minification helps for large amounts of JS are fairly unequivocal (see, e.g., Steve Souders' "High Performance Websites").

Why don't we benchmark this on the actual MediaWiki JS?

Roan Kattouw (Catrope)

Aryeh Gregor

3:30 a.m.

On Sun, Dec 14, 2008 at 2:24 PM, Roan Kattouw roan.kattouw@home.nl wrote:

...

Why don't we benchmark this on the actual MediaWiki JS?

Is that the sound of you volunteering? :)

Actually, to speed up loading, one thing we might want to look at first is moving the scripts to the bottom. This should be a significant help in current browsers (although not necessarily next-gen ones), which block all further file loads while they're loading a script.

Brion Vibber

7:41 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Aryeh Gregor wrote:

...

Actually, to speed up loading, one thing we might want to look at first is moving the scripts to the bottom. This should be a significant help in current browsers (although not necessarily next-gen ones), which block all further file loads while they're loading a script.

Aye, that's been on my wishlist. :D

It will require changes to a few remaining bits of script that do inline scripting assuming wikibits.js is already loaded, such as the ToC and edit toolbar, but that's a good idea anyway.

(Attaching some event handlers through script which are currently inline would also be good, to ensure functions aren't called before they've been loaded.)

- -- brion

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAklFp8kACgkQwRnhpk1wk46gEgCg0y5u9+BIAXnaJohJGmR9apzd vgMAoN/+YqG4Y+mx2LUaqnj3wZPc3KJV =j2sE -----END PGP SIGNATURE-----

Michael Dale

11:32 p.m.

Roan Kattouw wrote:

...

Aryeh Gregor schreef:

...
I doubt there's a substantial difference in the size depending on whether you include newlines or not. But benchmarking's the only way to test, right? The evidence that minification helps for large amounts of JS are fairly unequivocal (see, e.g., Steve Souders' "High Performance Websites").

Why don't we benchmark this on the actual MediaWiki JS?

Roan Kattouw (Catrope)

I don't think the current set of javascript will benefit from this effort as much as _future_ set of javascript. ie future libraries will contain many thousands of lines distributed over dozens of files rather than hundreds of lines over a handful of js files.

--michael

Roan Kattouw

11:56 p.m.

Michael Dale schreef:

...

Roan Kattouw wrote:

...
Why don't we benchmark this on the actual MediaWiki JS?

I don't think the current set of javascript will benefit from this effort as much as _future_ set of javascript. ie future libraries will contain many thousands of lines distributed over dozens of files rather than hundreds of lines over a handful of js files.

I suggest we re-evaluate our decision when we actually *reach* that situation in the future (assuming that'll even happen). Introducing minification now because we *might* need it in one *possible* future seems kind of pointless to me. We should base our decision whether or not to minimize on the *current* situation, and re-evaluate when the situation changes significantly.

Roan Kattouw (Catrope)

Jared Williams

16 Dec 16 Dec

2:39 a.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Michael Dale Sent: 15 December 2008 16:33 To: Wikimedia developers Subject: Re: [Wikitech-l] Future of Javascript and mediaWiki

Roan Kattouw wrote:

...
Aryeh Gregor schreef:

...
I doubt there's a substantial difference in the size depending on whether you include

newlines or

...
...
not. But benchmarking's the only way to test, right? The

evidence

...
...
that minification helps for large amounts of JS are fairly unequivocal (see, e.g., Steve Souders' "High Performance

Websites").

...
...
Why don't we benchmark this on the actual MediaWiki JS?

Roan Kattouw (Catrope)

I don't think the current set of javascript will benefit from this effort as much as _future_ set of javascript. ie future libraries will contain many thousands of lines distributed over dozens of files rather than hundreds of lines over a handful of js files.

Minification could be made pretty pointless in the future.

Chromium* has experimental tech within it, which can reduce the payload of each js/css request to something as small as 30 bytes.

Jared

* Google toolbar for IE supposedly implements it, but I've been unable to get it working.

Aryeh Gregor

5:30 a.m.

On Mon, Dec 15, 2008 at 2:39 PM, Jared Williams jared.williams1@ntlworld.com wrote:

...

Minification could be made pretty pointless in the future.

Chromium* has experimental tech within it, which can reduce the payload of each js/css request to something as small as 30 bytes.

Jared

Google toolbar for IE supposedly implements it, but I've been unable to

get it working.

Link?

Jared Williams

6:05 a.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Aryeh Gregor Sent: 15 December 2008 22:30 To: Wikimedia developers Subject: Re: [Wikitech-l] Future of Javascript and mediaWiki

On Mon, Dec 15, 2008 at 2:39 PM, Jared Williams jared.williams1@ntlworld.com wrote:

...
Minification could be made pretty pointless in the future.

Chromium* has experimental tech within it, which can reduce the payload of each js/css request to something as small as 30 bytes.

Jared

Google toolbar for IE supposedly implements it, but I've

been unable

...
to get it working.

Link?

Here's the paper (PDF)

http://sdch.googlegroups.com/web/Shared_Dictionary_Compression_over_HTTP.pdf ?gda=Cn21OV0AAADesD7oVzP2tIH3YMhCCYbwV7wKw6Y_LNfrKuXmihkMeg12alwZyuoqsE-BiY8 8xfLrk0HuZRJs1gcUl6mErWX6yPI8Lq4cE5IelfQO528z8OU2_747KStNgkfeVUa7Znk

The idea being you could get a sdch capable user agent to download the concated & gzipped javascript in a single request (called a dictionary), which quick testing is about 15kb for en.mediawiki.org, cache that on the client for a long period.

Then the individual requests for javascript can just return a diff (in RFC 3284) between the server version and the client version has in its dictionary. Obviously if the diffs get too large, can instruct the user agent to download a more up to date dictionary. Something around 30 bytes of body (off the top of my head) is the minimum size if the server & client versions are identical.

CSS also could be managed similarly.

Also possible to put the static (inline html) in templates into the dictionary, together with a lot of the translated messages, to try and reduce the HTML size, though not sure how effective it'd be.

Jared

Platonides

6:41 a.m.

Jared Williams wrote:

...

Here's the paper (PDF)

http://sdch.googlegroups.com/web/Shared_Dictionary_Compression_over_HTTP.pdf ?gda=Cn21OV0AAADesD7oVzP2tIH3YMhCCYbwV7wKw6Y_LNfrKuXmihkMeg12alwZyuoqsE-BiY8 8xfLrk0HuZRJs1gcUl6mErWX6yPI8Lq4cE5IelfQO528z8OU2_747KStNgkfeVUa7Znk

The idea being you could get a sdch capable user agent to download the concated & gzipped javascript in a single request (called a dictionary), which quick testing is about 15kb for en.mediawiki.org, cache that on the client for a long period.

Then the individual requests for javascript can just return a diff (in RFC 3284) between the server version and the client version has in its dictionary. Obviously if the diffs get too large, can instruct the user agent to download a more up to date dictionary. Something around 30 bytes of body (off the top of my head) is the minimum size if the server & client versions are identical.

CSS also could be managed similarly.

Also possible to put the static (inline html) in templates into the dictionary, together with a lot of the translated messages, to try and reduce the HTML size, though not sure how effective it'd be.

Jared

I don't see how it will help. The client still needs to download it. Be it the dictionary or the JS. What could be benefitted from SDCH are is skin. Instead of transmitting the skin, diff borders, etc. all of that would be in the dictionary, decreasing the transfer per page.

Jared Williams

7:57 a.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Platonides Sent: 15 December 2008 23:41 To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Future of Javascript and mediaWiki

Jared Williams wrote:

...
Here's the paper (PDF)

http://sdch.googlegroups.com/web/Shared_Dictionary_Compression_over_HT

...
TP.pdf

?gda=Cn21OV0AAADesD7oVzP2tIH3YMhCCYbwV7wKw6Y_LNfrKuXmihkMeg12alwZyuoqs

...
E-BiY8 8xfLrk0HuZRJs1gcUl6mErWX6yPI8Lq4cE5IelfQO528z8OU2_747KStNgkfeVUa7Znk

The idea being you could get a sdch capable user agent to

download the

...
concated & gzipped javascript in a single request (called a dictionary), which quick testing is about 15kb for

en.mediawiki.org,

...
cache that on the client for a long period.

Then the individual requests for javascript can just return

a diff (in

...
RFC 3284) between the server version and the client version has in its dictionary. Obviously if the diffs get too large, can instruct the user

agent to

...
download a more up to date dictionary. Something around 30 bytes of body (off the top of my head) is the minimum size if the server & client versions are identical.

CSS also could be managed similarly.

Also possible to put the static (inline html) in templates into the dictionary, together with a lot of the translated messages,

to try and

...
reduce the HTML size, though not sure how effective it'd be.

Jared

I don't see how it will help. The client still needs to download it. Be it the dictionary or the JS. What could be benefitted from SDCH are is skin. Instead of transmitting the skin, diff borders, etc. all of that would be in the dictionary, decreasing the transfer per page.

Gzip will compress concatenated javascript files a lot better than the gzipping the individual files.

Here's what FF is receiving from Main_Page atm...

Size URL 46634 http://upload.wikimedia.org/centralnotice/wikipedia/en/centralnotice.js?188 10275 http://en.wikipedia.org/w/index.php?title=-&action=raw&gen=js&us... ok 6581 http://meta.wikimedia.org/w/index.php?title=MediaWiki:Wikiminiatlas.js&a... n=raw&ctype=text/javascript&smaxage=21600&maxage=86400

Gzipping them concatenated results in single 15kb file.

Only needs to download it once, and kept on client for months, as server can send a diff to update to the most recent version at any time, rather than having to send a whole new js file.

SDCHing MediaWiki HTML would take some effort, as the page output is between skin classes and OutputPage etc.

Also would want the translation text from \languages\messages\Messages*.php in there too I think. Handling the $1 style placeholders is easy, its just determining what message goes through which wfMsg*() function, and if the WikiText translations can be preconverted to html.

But most of the HTML comes from article wikitext, so I wonder wether it'd beat gzip by anything significant.

Jared

Platonides

17 Dec 17 Dec

7:20 a.m.

Jared Williams wrote:

...

SDCHing MediaWiki HTML would take some effort, as the page output is between skin classes and OutputPage etc.

Also would want the translation text from \languages\messages\Messages*.php in there too I think. Handling the $1 style placeholders is easy, its just determining what message goes through which wfMsg*() function, and if the WikiText translations can be preconverted to html.

But most of the HTML comes from article wikitext, so I wonder wether it'd beat gzip by anything significant.

Jared

Note that SDCH is expected to be then gzipped, as they fulfill different needs. They aren't incompatible. You would use a dictionary for common skin bits, perhaps also adding some common page features, like the TOC code, 'amp;action=edit&redlink=1" class="new"'...

Having a second dictionary for language dependant output could be also interesting, but not all messages should be provided.

Simetrical wrote:

...

What happens if you have parser functions that depend on the value of $1 (allowed in some messages AFAIK)? What if $1 contains wikitext itself (I wouldn't be surprised if that were true somewhere)? How do you plan to do this substitution anyway, JavaScript? What about clients that don't support JavaScript?

/Usually/, you don't create the dictionary output by hand, but pass the page to a "dictionary compresser" (or so is expected, this is too much experimental yet). If a parser function changed it completely, they will just be literals. If you have a parametrized block, the vcdiff would see, "this piece up to Foo matches this dictionary section, before $1. And this other matches the text following Foo..."

Jared wrote:

...

I do have working PHP code, That can parse PHP templates & language strings to generate the dictionary, and a new set of templates rewritten to output the vcdiff efficiently.

Please share?

Jared Williams

8:57 p.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Platonides Sent: 17 December 2008 00:20 To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Future of Javascript and mediaWiki

Jared Williams wrote:

...
SDCHing MediaWiki HTML would take some effort, as the page

output is

...
between skin classes and OutputPage etc.

Also would want the translation text from \languages\messages\Messages*.php in there too I think.

Handling the

...
$1 style placeholders is easy, its just determining what

message goes

...
through which wfMsg*() function, and if the WikiText

translations can be preconverted to html.

...
But most of the HTML comes from article wikitext, so I

wonder wether

...
it'd beat gzip by anything significant.

Jared

Note that SDCH is expected to be then gzipped, as they fulfill different needs. They aren't incompatible. You would use a dictionary for common skin bits, perhaps also adding some common page features, like the TOC code, 'amp;action=edit&redlink=1" class="new"'...

Having a second dictionary for language dependant output could be also interesting, but not all messages should be provided.

Unfortunately, whilst the useragent can announce it has multiple dictionaries, the SDCH response can only indicate it used a single dictionary.

...

Simetrical wrote:

...
What happens if you have parser functions that depend on

the value of

...
$1 (allowed in some messages AFAIK)? What if $1 contains wikitext itself (I wouldn't be surprised if that were true

somewhere)? How do

...
you plan to do this substitution anyway, JavaScript? What about clients that don't support JavaScript?

/Usually/, you don't create the dictionary output by hand, but pass the page to a "dictionary compresser" (or so is expected, this is too much experimental yet). If a parser function changed it completely, they will just be literals. If you have a parametrized block, the vcdiff would see, "this piece up to Foo matches this dictionary section, before $1. And this other matches the text following Foo..."

What I have atm, just traverses a directory of templates, using PHPs built in tokenizer to extract T_INLINE_HTML tokens into the dictionary (if greater than 3 bytes long), and replacing with them with a call to output the vcdiff copy opcodes.

So <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="<?php $e($this->lang); ?>"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title><?php $e($this->title); ?>

Becomes <?php $this->copy(0, 53);$e($this->lang); $this->copy(53, 91);$e($this->title);

PHPs output buffering captures the output from the PHP code within the template, which essentially becomes the data section of the vcdiff.

...

Jared wrote:

...
I do have working PHP code, That can parse PHP templates & language strings to generate the dictionary, and a new set of templates rewritten to output the vcdiff efficiently.

Please share?

Intend too, I probably should document/add some comments first :)

Jared

Aryeh Gregor

16 Dec 16 Dec

8:49 a.m.

On Mon, Dec 15, 2008 at 6:05 PM, Jared Williams jared.williams1@ntlworld.com wrote:

...

The idea being you could get a sdch capable user agent to download the concated & gzipped javascript in a single request (called a dictionary), which quick testing is about 15kb for en.mediawiki.org, cache that on the client for a long period.

Except that if you can do that, you can fetch the ordinary JS anyway and keep it until it expires. If clients actually did this, you'd be talking about single-digit JS requests per client per week, and further optimization would be pointless. The fact of the matter is that a significant percentage of views have no cached files from Wikipedia, for whatever reason. Those are the ones we're concerned with, and your idea does nothing to help them.

...

Also possible to put the static (inline html) in templates into the dictionary, together with a lot of the translated messages, to try and reduce the HTML size, though not sure how effective it'd be.

It looks to me like you're talking about hugely increasing the size of the first page view on the optimistic assumption that first page views account for only a very small percentage of hits. Why do you think that assumption is warranted?

One possible use of SDCH is that it would allow us to refrain from re-transmitting the redundant parts of each response, most of the skin bits. But that's probably under a couple of KB gzipped, so we're not talking about much savings here. It might even take longer to compute and apply the diffs than to transmit the extra data, over typical hardware.

On Mon, Dec 15, 2008 at 7:57 PM, Jared Williams jared.williams1@ntlworld.com wrote:

...

Only needs to download it once, and kept on client for months, as server can send a diff to update to the most recent version at any time, rather than having to send a whole new js file.

What leads you to believe that the client will actually *keep* it for months? Caches on clients are of finite size and things get evicted. The oldest file I have in Firefox's cache right now is from December 12, three days ago:

$ ls -lt /home/aryeh/.mozilla/firefox/7y6zjmhm.default/Cache | tail -rw------- 1 aryeh aryeh 22766 2008-12-12 09:34 14C66B37d01 -rw------- 1 aryeh aryeh 28914 2008-12-12 09:34 20916AFCd01 -rw------- 1 aryeh aryeh 59314 2008-12-12 09:34 57BFF8A6d01 -rw------- 1 aryeh aryeh 39826 2008-12-12 09:34 9EB67039d01 -rw------- 1 aryeh aryeh 52145 2008-12-12 09:34 A2F75899d01 -rw------- 1 aryeh aryeh 18072 2008-12-12 09:34 C11C3B29d01 -rw------- 1 aryeh aryeh 25590 2008-12-12 09:34 C845BED2d01 -rw------- 1 aryeh aryeh 32173 2008-12-12 09:34 D69FA933d01 -rw------- 1 aryeh aryeh 45189 2008-12-12 09:34 F2AD5614d01 -rw------- 1 aryeh aryeh 25164 2008-12-12 09:34 DBA05B7Bd01

A fair percentage of computers might be even worse, like public computers configured to clean cache and other private data when the user logs out.

Concatenating the (user-independent) wikibits.js and so on to the (user-dependent) page HTML will prohibit intermediate proxies from caching the user-invariant stuff, too, since the entire thing will need to be marked Cache-Control: private. It will prevent our own Squids from caching the CSS and JS, in fact.

...

Also would want the translation text from \languages\messages\Messages*.php in there too I think. Handling the $1 style placeholders is easy, its just determining what message goes through which wfMsg*() function, and if the WikiText translations can be preconverted to html.

What happens if you have parser functions that depend on the value of $1 (allowed in some messages AFAIK)? What if $1 contains wikitext itself (I wouldn't be surprised if that were true somewhere)? How do you plan to do this substitution anyway, JavaScript? What about clients that don't support JavaScript?

Jared Williams

9:50 a.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Aryeh Gregor Sent: 16 December 2008 01:49 To: Wikimedia developers Subject: Re: [Wikitech-l] Future of Javascript and mediaWiki

On Mon, Dec 15, 2008 at 6:05 PM, Jared Williams jared.williams1@ntlworld.com wrote:

...
The idea being you could get a sdch capable user agent to

download the

...
concated & gzipped javascript in a single request (called a dictionary), which quick testing is about 15kb for

en.mediawiki.org,

...
cache that on the client for a long period.

Except that if you can do that, you can fetch the ordinary JS anyway and keep it until it expires. If clients actually did this, you'd be talking about single-digit JS requests per client per week, and further optimization would be pointless. The fact of the matter is that a significant percentage of views have no cached files from Wikipedia, for whatever reason. Those are the ones we're concerned with, and your idea does nothing to help them.

Technically it is possible to remove some of the JS & CSS roundtrips, as inlining the JS & CSS no longer comes at a download cost, the cost of copying something from the dictionary is around 3-5 bytes.

...

...
Also possible to put the static (inline html) in templates into the dictionary, together with a lot of the translated messages,

to try and

...
reduce the HTML size, though not sure how effective it'd be.

It looks to me like you're talking about hugely increasing the size of the first page view on the optimistic assumption that first page views account for only a very small percentage of hits. Why do you think that assumption is warranted?

The SDCH protocol has a time delay built in, so it wouldn't download the dictionary unless the browser has remained on a page (that has requested the

client download the dictionary) for a time.

...

One possible use of SDCH is that it would allow us to refrain from re-transmitting the redundant parts of each response, most of the skin bits. But that's probably under a couple of KB gzipped, so we're not talking about much savings here. It might even take longer to compute and apply the diffs than to transmit the extra data, over typical hardware.

Naively tried, was around 3kb for monobook. Hence my pessimism in my previous post, about beating plain gzip. The ratio of wikitext article output to skin bits is too high, I think.

...

On Mon, Dec 15, 2008 at 7:57 PM, Jared Williams jared.williams1@ntlworld.com wrote:

...
Only needs to download it once, and kept on client for months, as server can send a diff to update to the most recent version at any time, rather than having to send a whole new js file.

What leads you to believe that the client will actually *keep* it for months? Caches on clients are of finite size and things get evicted. The oldest file I have in Firefox's cache right now is from December 12, three days ago:

$ ls -lt /home/aryeh/.mozilla/firefox/7y6zjmhm.default/Cache | tail -rw------- 1 aryeh aryeh 22766 2008-12-12 09:34 14C66B37d01 -rw------- 1 aryeh aryeh 28914 2008-12-12 09:34 20916AFCd01 -rw------- 1 aryeh aryeh 59314 2008-12-12 09:34 57BFF8A6d01 -rw------- 1 aryeh aryeh 39826 2008-12-12 09:34 9EB67039d01 -rw------- 1 aryeh aryeh 52145 2008-12-12 09:34 A2F75899d01 -rw------- 1 aryeh aryeh 18072 2008-12-12 09:34 C11C3B29d01 -rw------- 1 aryeh aryeh 25590 2008-12-12 09:34 C845BED2d01 -rw------- 1 aryeh aryeh 32173 2008-12-12 09:34 D69FA933d01 -rw------- 1 aryeh aryeh 45189 2008-12-12 09:34 F2AD5614d01 -rw------- 1 aryeh aryeh 25164 2008-12-12 09:34 DBA05B7Bd01

Dictionaries don't seem to get deleted when clearing cache atleast. They have their own mechanism to specify how long they should be kept.

...

A fair percentage of computers might be even worse, like public computers configured to clean cache and other private data when the user logs out.

Concatenating the (user-independent) wikibits.js and so on to the (user-dependent) page HTML will prohibit intermediate proxies from caching the user-invariant stuff, too, since the entire thing will need to be marked Cache-Control: private. It will prevent our own Squids from caching the CSS and JS, in fact.

Why would you concatenate user-independent with user-dependent HTML?

I think I only suggested skin dependant, and language dependant dictionary.

...

...
Also would want the translation text from \languages\messages\Messages*.php in there too I think.

Handling the

...
$1 style placeholders is easy, its just determining what

message goes

...
through which wfMsg*() function, and if the WikiText

translations can be preconverted to html.

What happens if you have parser functions that depend on the value of $1 (allowed in some messages AFAIK)? What if $1 contains wikitext itself (I wouldn't be surprised if that were true somewhere)? How do you plan to do this substitution anyway, JavaScript? What about clients that don't support JavaScript?

The substitution doesn't really happen, as its done server side, translations are essentially preparsed, saving time in template rendering.

For example

http://www.amazon.com/exec/obidos/ISBN=$1

http://www.amazon.com/exec/obidos/ISBN= would be put into the dictionary, and server metadata for the translation becomes...

'Amazon.com' => array( array(offset, 39), // COPY 39 bytes from dictionary offset 1 // Output value of parameter 1. );

I do have working PHP code, That can parse PHP templates & language strings to generate the dictionary, and a new set of templates rewritten to output the vcdiff efficiently.

Jared

5829

Age (days ago)

5834

Last active (days ago)

wikitech-l@lists.wikimedia.org

25 comments

9 participants

tags (0)

participants (9)

Aryeh Gregor
Brion Vibber
Daniel Friesen
Gregory Maxwell
Jared Williams
Michael Dale
Michael Dale
Platonides
Roan Kattouw