I'm going to mention this here, because it might be of interest on the Wikimedia cluster (or it might not).
Last night I deposited Extension:Minify which is essentially a lightweight wrapper for the YUI CSS compressor and JSMin JavaScript compressor. If installed it automatically captures all content exported through action=raw and precompresses it by removing comments, formatting, and other human readable elements. All of the helpful elements still remain on the Mediawiki: pages, but they just don't get sent to users.
Currently each page served to anons references 6 CSS/JS pages dynamically prepared by Mediawiki, of which 4 would be needed in the most common situation of viewing content online (i.e. assuming media="print" and media="handheld" are not downloaded in the typical case).
These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css, and gen=js comprise about 60 kB on the English Wikipedia. (I'm using enwiki as a benchmark, but Commons and dewiki also have similar numbers to those discussed below.)
After gzip compression, which I assume is available on most HTTP transactions these days, they total 17039 bytes. The comparable numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a savings of 7 kB or about 40% of the total file size.
Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup connection. Or given Erik Zachte's observation that action=raw is called 500 million times per day, and assuming up to 7 kB / 4 savings per call, could shave up to 900 GB off of Wikimedia's daily traffic. (In practice, it would probably be somewhat less. 900 GB seems to be slightly under 2% of Wikimedia's total daily traffic if I am reading the charts correctly.)
Anyway, that's the use case (such as it is): slightly faster initial downloads and a small but probably measurable impact on total bandwidth. The trade-off of course being that users receive CSS and JS pages from action=raw that are largely unreadable. The extension exists if Wikimedia is interested, though to be honest I primarily created it for use with my own more tightly bandwidth constrained sites.
-Robert Rohde
It's probably worth mentioning that this bug is still open: https://bugzilla.wikimedia.org/show_bug.cgi?id=17577
This will save not only traffic on subsequent page views (in this case: http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/... about 50K), but also improve performance dramatically.
I wonder if anything can be done to at least make it work for local files - I have hard time understanding File vs. LocalFile vs. FSRepo relationships to enable this just for local file system.
It's probably also wise to figure out a way for it to be implemented on non-local repositories too so Wikimedia projects can use it, but I'm completely out of the league here ;)
Thank you,
Sergey
-- Sergey Chernyshev http://www.sergeychernyshev.com/
On Fri, Jun 26, 2009 at 11:42 AM, Robert Rohde rarohde@gmail.com wrote:
I'm going to mention this here, because it might be of interest on the Wikimedia cluster (or it might not).
Last night I deposited Extension:Minify which is essentially a lightweight wrapper for the YUI CSS compressor and JSMin JavaScript compressor. If installed it automatically captures all content exported through action=raw and precompresses it by removing comments, formatting, and other human readable elements. All of the helpful elements still remain on the Mediawiki: pages, but they just don't get sent to users.
Currently each page served to anons references 6 CSS/JS pages dynamically prepared by Mediawiki, of which 4 would be needed in the most common situation of viewing content online (i.e. assuming media="print" and media="handheld" are not downloaded in the typical case).
These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css, and gen=js comprise about 60 kB on the English Wikipedia. (I'm using enwiki as a benchmark, but Commons and dewiki also have similar numbers to those discussed below.)
After gzip compression, which I assume is available on most HTTP transactions these days, they total 17039 bytes. The comparable numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a savings of 7 kB or about 40% of the total file size.
Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup connection. Or given Erik Zachte's observation that action=raw is called 500 million times per day, and assuming up to 7 kB / 4 savings per call, could shave up to 900 GB off of Wikimedia's daily traffic. (In practice, it would probably be somewhat less. 900 GB seems to be slightly under 2% of Wikimedia's total daily traffic if I am reading the charts correctly.)
Anyway, that's the use case (such as it is): slightly faster initial downloads and a small but probably measurable impact on total bandwidth. The trade-off of course being that users receive CSS and JS pages from action=raw that are largely unreadable. The extension exists if Wikimedia is interested, though to be honest I primarily created it for use with my own more tightly bandwidth constrained sites.
-Robert Rohde
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
The structure is LocalRepo extends FSRepo extends FileRepo. ForeignApiRepo extends FileRepo directly, and ForeignDbRepo extends LocalRepo.
-Chad
On Jun 26, 2009 3:15 PM, "Sergey Chernyshev" sergey.chernyshev@gmail.com wrote:
It's probably worth mentioning that this bug is still open: https://bugzilla.wikimedia.org/show_bug.cgi?id=17577
This will save not only traffic on subsequent page views (in this case: http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/... about 50K), but also improve performance dramatically.
I wonder if anything can be done to at least make it work for local files - I have hard time understanding File vs. LocalFile vs. FSRepo relationships to enable this just for local file system.
It's probably also wise to figure out a way for it to be implemented on non-local repositories too so Wikimedia projects can use it, but I'm completely out of the league here ;)
Thank you,
Sergey
-- Sergey Chernyshev http://www.sergeychernyshev.com/
On Fri, Jun 26, 2009 at 11:42 AM, Robert Rohde rarohde@gmail.com wrote: > I'm going to mention ...
Which of all those file to change to apply my patch only to files in default repository? Currently my patch is applied to File.php
http://bug-attachment.wikimedia.org/attachment.cgi?id=5833
If you just point me into right direction, I'll update the patch and upload it myself.
Thank you,
Sergey
-- Sergey Chernyshev http://www.sergeychernyshev.com/
On Fri, Jun 26, 2009 at 3:17 PM, Chad innocentkiller@gmail.com wrote:
The structure is LocalRepo extends FSRepo extends FileRepo. ForeignApiRepo extends FileRepo directly, and ForeignDbRepo extends LocalRepo.
-Chad
On Jun 26, 2009 3:15 PM, "Sergey Chernyshev" sergey.chernyshev@gmail.com wrote:
It's probably worth mentioning that this bug is still open: https://bugzilla.wikimedia.org/show_bug.cgi?id=17577
This will save not only traffic on subsequent page views (in this case:
http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/...http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it%27s about 50K), but also improve performance dramatically.
I wonder if anything can be done to at least make it work for local files - I have hard time understanding File vs. LocalFile vs. FSRepo relationships to enable this just for local file system.
It's probably also wise to figure out a way for it to be implemented on non-local repositories too so Wikimedia projects can use it, but I'm completely out of the league here ;)
Thank you,
Sergey
-- Sergey Chernyshev http://www.sergeychernyshev.com/
On Fri, Jun 26, 2009 at 11:42 AM, Robert Rohde rarohde@gmail.com wrote:
I'm going to mention ... _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
You're patching already-existing functionality at the File level, so it should be ok to just plop it in there. I'm not sure how this will affect the ForeignApi interface, so it'd be worth testing there too.
From what I can tell at a (very) quick glance, it shouldn't adversely
affect anything from a client perspective on the API, as we just rely on whatever URL was provided to us to begin with.
-Chad
On Fri, Jun 26, 2009 at 3:31 PM, Sergey Chernyshevsergey.chernyshev@gmail.com wrote:
Which of all those file to change to apply my patch only to files in default repository? Currently my patch is applied to File.php
http://bug-attachment.wikimedia.org/attachment.cgi?id=5833
If you just point me into right direction, I'll update the patch and upload it myself.
Thank you,
Sergey
-- Sergey Chernyshev http://www.sergeychernyshev.com/
On Fri, Jun 26, 2009 at 3:17 PM, Chad innocentkiller@gmail.com wrote:
The structure is LocalRepo extends FSRepo extends FileRepo. ForeignApiRepo extends FileRepo directly, and ForeignDbRepo extends LocalRepo.
-Chad
On Jun 26, 2009 3:15 PM, "Sergey Chernyshev" sergey.chernyshev@gmail.com wrote:
It's probably worth mentioning that this bug is still open: https://bugzilla.wikimedia.org/show_bug.cgi?id=17577
This will save not only traffic on subsequent page views (in this case:
http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/...http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it%27s about 50K), but also improve performance dramatically.
I wonder if anything can be done to at least make it work for local files - I have hard time understanding File vs. LocalFile vs. FSRepo relationships to enable this just for local file system.
It's probably also wise to figure out a way for it to be implemented on non-local repositories too so Wikimedia projects can use it, but I'm completely out of the league here ;)
Thank you,
Sergey
-- Sergey Chernyshev http://www.sergeychernyshev.com/
On Fri, Jun 26, 2009 at 11:42 AM, Robert Rohde rarohde@gmail.com wrote:
I'm going to mention ... _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
It probably depend on how getTimestamp() is implemented for non-local repos. Important thing is not to have it return new values too often and return real "version" of the image.
If this is already the case, can someone apply this patch then - don't want to be responsible for such an important change ;)
Sergey
On Fri, Jun 26, 2009 at 3:52 PM, Chad innocentkiller@gmail.com wrote:
You're patching already-existing functionality at the File level, so it should be ok to just plop it in there. I'm not sure how this will affect the ForeignApi interface, so it'd be worth testing there too.
From what I can tell at a (very) quick glance, it shouldn't adversely affect anything from a client perspective on the API, as we just rely on whatever URL was provided to us to begin with.
-Chad
On Fri, Jun 26, 2009 at 3:31 PM, Sergey Chernyshevsergey.chernyshev@gmail.com wrote:
Which of all those file to change to apply my patch only to files in
default
repository? Currently my patch is applied to File.php
http://bug-attachment.wikimedia.org/attachment.cgi?id=5833
If you just point me into right direction, I'll update the patch and
upload
it myself.
Thank you,
Sergey
-- Sergey Chernyshev http://www.sergeychernyshev.com/
On Fri, Jun 26, 2009 at 3:17 PM, Chad innocentkiller@gmail.com wrote:
The structure is LocalRepo extends FSRepo extends FileRepo. ForeignApiRepo extends FileRepo directly, and ForeignDbRepo extends LocalRepo.
-Chad
On Jun 26, 2009 3:15 PM, "Sergey Chernyshev" <
sergey.chernyshev@gmail.com>
wrote:
It's probably worth mentioning that this bug is still open: https://bugzilla.wikimedia.org/show_bug.cgi?id=17577
This will save not only traffic on subsequent page views (in this case:
http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/...http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/details/cached/it%27s < http://www.webpagetest.org/result/090218_132826127ab7f254499631e3e688b24b/1/...
about 50K), but also improve performance dramatically.
I wonder if anything can be done to at least make it work for local
files -
I have hard time understanding File vs. LocalFile vs. FSRepo
relationships
to enable this just for local file system.
It's probably also wise to figure out a way for it to be implemented on non-local repositories too so Wikimedia projects can use it, but I'm completely out of the league here ;)
Thank you,
Sergey
-- Sergey Chernyshev http://www.sergeychernyshev.com/
On Fri, Jun 26, 2009 at 11:42 AM, Robert Rohde rarohde@gmail.com
wrote:
I'm going to mention ... _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I would quickly add that the script-loader / new-upload branch also supports minify along with associating unique id's grouping & gziping.
So all your mediaWiki page includes are tied to their version numbers and can be cached forever without 304 requests by the client or _shift_ reload to get new js.
Plus it works with all the static file based js includes as well. If a given set of files is constantly requested we can group them to avoid server round trips. And finally it lets us localize msg and package that in the JS (again avoiding separate trips for javascript interface msgs)
for more info see the ~slightly outdated~ document: http://www.mediawiki.org/wiki/Extension:ScriptLoader
peace, michael
Robert Rohde wrote:
I'm going to mention this here, because it might be of interest on the Wikimedia cluster (or it might not).
Last night I deposited Extension:Minify which is essentially a lightweight wrapper for the YUI CSS compressor and JSMin JavaScript compressor. If installed it automatically captures all content exported through action=raw and precompresses it by removing comments, formatting, and other human readable elements. All of the helpful elements still remain on the Mediawiki: pages, but they just don't get sent to users.
Currently each page served to anons references 6 CSS/JS pages dynamically prepared by Mediawiki, of which 4 would be needed in the most common situation of viewing content online (i.e. assuming media="print" and media="handheld" are not downloaded in the typical case).
These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css, and gen=js comprise about 60 kB on the English Wikipedia. (I'm using enwiki as a benchmark, but Commons and dewiki also have similar numbers to those discussed below.)
After gzip compression, which I assume is available on most HTTP transactions these days, they total 17039 bytes. The comparable numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a savings of 7 kB or about 40% of the total file size.
Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup connection. Or given Erik Zachte's observation that action=raw is called 500 million times per day, and assuming up to 7 kB / 4 savings per call, could shave up to 900 GB off of Wikimedia's daily traffic. (In practice, it would probably be somewhat less. 900 GB seems to be slightly under 2% of Wikimedia's total daily traffic if I am reading the charts correctly.)
Anyway, that's the use case (such as it is): slightly faster initial downloads and a small but probably measurable impact on total bandwidth. The trade-off of course being that users receive CSS and JS pages from action=raw that are largely unreadable. The extension exists if Wikimedia is interested, though to be honest I primarily created it for use with my own more tightly bandwidth constrained sites.
-Robert Rohde
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Jun 26, 2009 at 4:33 PM, Michael Dalemdale@wikimedia.org wrote:
I would quickly add that the script-loader / new-upload branch also supports minify along with associating unique id's grouping & gziping.
So all your mediaWiki page includes are tied to their version numbers and can be cached forever without 304 requests by the client or _shift_ reload to get new js.
Hm. Unique ids?
Does this mean the every page on the site must be purged from the caches to cause all requests to see a new version number?
Is there also some pending squid patch to let it jam in a new ID number on the fly for every request? Or have I misunderstood what this does?
On Fri, Jun 26, 2009 at 4:49 PM, Gregory Maxwellgmaxwell@gmail.com wrote:
Hm. Unique ids?
Does this mean the every page on the site must be purged from the caches to cause all requests to see a new version number?
Is there also some pending squid patch to let it jam in a new ID number on the fly for every request? Or have I misunderstood what this does?
We already have version numbers on static CSS/JS, and we just don't bother purging the HTML. So any old Squid hits might see the old include, or the new one. It's not often noticeable in practice, even if you get the old HTML with the new scripts/styles.
correct me if I am wrong but thats how we presently update js and css.. we have $wgStyleVersion and when that gets updated we send out fresh pages with html pointing to js with $wgStyleVersion append.
The difference in the context of the script-loader is we would read the version from the mediaWiki js pages that are being included and the $wgStyleVersion var. (avoiding the need to shift reload) ... in the context of rendering a normal page with dozens of template lookups I don't see this a particularly costly. Its a few extra getLatestRevID title calls. Likewise we should do this for images so we can send the cache forever header (bug 17577) avoiding a bunch of 304 requests.
One part I am not completely clear on is how we avoid lots of simultaneous requests to the scriptLoader when it first generates the JavaScript to be cached on the squids, but other stuff must be throttled too no? Like when we update any code, language msgs, or local-settings does that does not result in the immediate purging all of wikipedia.
--michael
Gregory Maxwell wrote:
On Fri, Jun 26, 2009 at 4:33 PM, Michael Dalemdale@wikimedia.org wrote:
I would quickly add that the script-loader / new-upload branch also supports minify along with associating unique id's grouping & gziping.
So all your mediaWiki page includes are tied to their version numbers and can be cached forever without 304 requests by the client or _shift_ reload to get new js.
Hm. Unique ids?
Does this mean the every page on the site must be purged from the caches to cause all requests to see a new version number?
Is there also some pending squid patch to let it jam in a new ID number on the fly for every request? Or have I misunderstood what this does?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Jun 26, 2009 at 5:24 PM, Michael Dalemdale@wikimedia.org wrote:
The difference in the context of the script-loader is we would read the version from the mediaWiki js pages that are being included and the $wgStyleVersion var. (avoiding the need to shift reload) ... in the context of rendering a normal page with dozens of template lookups I don't see this a particularly costly. Its a few extra getLatestRevID title calls.
It's not costly unless we have to purge Squid for everything, which probably we don't. People could just use old versions, it's not *that* dangerous.
Likewise we should do this for images so we can send the cache forever header (bug 17577) avoiding a bunch of 304 requests.
Any given image is not included on every single page on the wiki. Purging a few thousand pages from Squid on an image reupload (should be rare for such a heavily-used image) is okay. Purging every single page on the wiki is not.
One part I am not completely clear on is how we avoid lots of simultaneous requests to the scriptLoader when it first generates the JavaScript to be cached on the squids, but other stuff must be throttled too no? Like when we update any code, language msgs, or local-settings does that does not result in the immediate purging all of wikipedia.
No. We don't purge Squid on these events, we just let people see old copies. Of course, this doesn't normally apply to registered users (who usually [always?] get Squid misses), or to pages that aren't cached (edit, history, . . .).
Aryeh Gregor wrote:
Any given image is not included on every single page on the wiki. Purging a few thousand pages from Squid on an image reupload (should be rare for such a heavily-used image) is okay. Purging every single page on the wiki is not.
yea .. we are just talking about adding image.jpg?image_revision_id to all the image src at page render time should never purge everything on the wiki ;)
No. We don't purge Squid on these events, we just let people see old copies. Of course, this doesn't normally apply to registered users (who usually [always?] get Squid misses), or to pages that aren't cached (edit, history, . . .).
oky thats basically what I understood. That makes sense.. although it would be nice to think about a job or process that purges pages with outdated language msg, or pages that are referencing outdated scripts, style-sheet, or image urls.
We ~do~ add jobs to purge for template updates. Are other things like language msg & code updates candidates for job purge tasks? ... I guess its not too big a deal to get an old page until someone updates it.
--michael
2009/6/26 Robert Rohde rarohde@gmail.com:
I'm going to mention this here, because it might be of interest on the Wikimedia cluster (or it might not).
Last night I deposited Extension:Minify which is essentially a lightweight wrapper for the YUI CSS compressor and JSMin JavaScript compressor. If installed it automatically captures all content exported through action=raw and precompresses it by removing comments, formatting, and other human readable elements. All of the helpful elements still remain on the Mediawiki: pages, but they just don't get sent to users.
Currently each page served to anons references 6 CSS/JS pages dynamically prepared by Mediawiki, of which 4 would be needed in the most common situation of viewing content online (i.e. assuming media="print" and media="handheld" are not downloaded in the typical case).
These 4 pages, Mediawiki:Common.css, Mediawiki:Monobook.css, gen=css, and gen=js comprise about 60 kB on the English Wikipedia. (I'm using enwiki as a benchmark, but Commons and dewiki also have similar numbers to those discussed below.)
After gzip compression, which I assume is available on most HTTP transactions these days, they total 17039 bytes. The comparable numbers if Minify is applied are 35 kB raw and 9980 after gzip, for a savings of 7 kB or about 40% of the total file size.
Now in practical terms 7 kB could shave ~1.5s off a 36 kbps dialup connection. Or given Erik Zachte's observation that action=raw is called 500 million times per day, and assuming up to 7 kB / 4 savings per call, could shave up to 900 GB off of Wikimedia's daily traffic. (In practice, it would probably be somewhat less. 900 GB seems to be slightly under 2% of Wikimedia's total daily traffic if I am reading the charts correctly.)
Anyway, that's the use case (such as it is): slightly faster initial downloads and a small but probably measurable impact on total bandwidth. The trade-off of course being that users receive CSS and JS pages from action=raw that are largely unreadable. The extension exists if Wikimedia is interested, though to be honest I primarily created it for use with my own more tightly bandwidth constrained sites.
This sounds great but I have a problem with making action=raw return something that is not raw. For MediaWiki I think it would be better to add a new action=minify
What would the pluses and minuses of that be?
Andrew Dunbar (hippietrail)
-Robert Rohde
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Andrew Dunbar wrote:
This sounds great but I have a problem with making action=raw return something that is not raw. For MediaWiki I think it would be better to add a new action=minify
What would the pluses and minuses of that be?
Andrew Dunbar (hippietrail)
+1
There are uses depending on action=raw to download the wikitext. The proposed new action is autoexplicative, so it seems a good idea.
On Sat, Jun 27, 2009 at 4:04 PM, PlatonidesPlatonides@gmail.com wrote:
Andrew Dunbar wrote:
This sounds great but I have a problem with making action=raw return something that is not raw. For MediaWiki I think it would be better to add a new action=minify
What would the pluses and minuses of that be?
Andrew Dunbar (hippietrail)
+1
There are uses depending on action=raw to download the wikitext. The proposed new action is autoexplicative, so it seems a good idea.
Are their uses that depend on dowloanding the wikitext of CSS and JS pages??
The extension only modifies content from CSS and JS pages (e.g. Monobook.css), and then only if the ctype and/or gen query parameter is also specified. It would leave the content from action=raw unchanged in all other cases. Capturing content via action=raw seems necessary since that is the request mechanism used by Skin.php when inserting CSS and JS links. To generate the same effect with "action=minify" would require significant changes to Skin.php, and would seem to go well beyond the scope of a simple extension.
-Robert Rohde
wikitech-l@lists.wikimedia.org