For most cases, the vast majority of a wiki's traffic is from non-logged in users. All kinds of caches should then work in a way where the page does not have to be rendered again with PHP. An example is Mediawiki's File caching system: http://www.mediawiki.org/wiki/Manual:File_cache It's commonly known that PHP files take a lot more CPU than static content and this is also the concept behind MW's File caching system. My situation: I'm on a shared server where they don't want me to go above certain CPU limits (cpu seconds/per hour). I'm not able to install Squid, APC or memcached. Lately I've been having problems with CPU usage due to traffic surges and malicious bots. I don't want to spend more money on hosting if I don't have to but that option is open if the server company thinks I should upgrade. I want to be a good client and not effect other users on the server.
Here's a problem I see with MW's File caching system. It still processes PHP files, e.g. here's some actual lines of code from my wiki's page when it loads a page from the File caching system. Wikipedia also loads these PHP files, thus increasing CPU usage: ------------ <link rel="stylesheet" href="http://mywikisite.com/w/*load.php*?debug=false&lang=en&modules=site&only=styles&skin=vector&*" /> <link rel="stylesheet" href="http://mywikisite.com/w/*load.php*?debug=false&lang=en&modules=ext.flaggedRevs.basic%7Cmediawiki.legacy.commonPrint%2Cshared%7Cskins.vector&only=styles&skin=vector&*" /> <script src="http://mywikisite.com/w/*load.php* ?debug=false&lang=en&modules=skins.vector&only=scripts&skin=vector&*"></script> <script src="http://mywikisite.com/w/*load.php* ?debug=false&lang=en&modules=site&only=scripts&skin=vector&*"></script> ------------ To confirm this, I have seen the static HTML file generated by the cache, and these lines are present in the HTML code that is viewed from the browser or an HTML editor. So load.php is being made to run at least 4 times during each page load. It may be 3 times for my site if Flagged revisions wasnt installed, but again, Wikipedia has similiar lines of code which make multiple calls to Load.php. Yesterday I had a huge traffic spike and the server process scan confirmed that Load.php was running a lot of times. If three pages are loaded at about the same time, that means 12 calls to Load.php. I've also compared situations where I wasn't using any cache and where I was using the File cache, and I didn't see any noticeable difference in the CPU usage.
So I think MW's File caching system should be improved so that no PHP processing is required at all for non-logged users. After all, the same exact copy of the page is going to be served to non-logged in users so it makes sense to have 100% of that content static so it doesn't require any PHP processing at all. The only time PHP should run is when content changes. That should refresh the cache and regenerate the static content. In my case, I also have a mobile skin so I've modified LocalSettings.php to use the Mobile cache directory if its a mobile user so I have two different sets of cache, one for the computer screen and another for mobile users.
I know this isn't a problem for Wikipedia because they have a lot of servers and have additional great caching systems (squid, memcached, etc) so everything is fast.But I'm thinking if those calls to Load.php were cut down, it would make it possible for Wikipedia to use less servers and would also make everyone else's sites run faster. In any case, PHP processing should be used minimally, only when necessary. If the page had no calls to PHP files, it would use less CPU and again, if the same page is being served to non-logged in users, ideally there should be no or very little PHP processing.
Any thoughts from the developers? Is it possible to modify the File caching system to eliminate calls to Load.php, so more of the content served is static?
thanks Dan
Hi,
On Sun, Aug 26, 2012 at 5:53 PM, Dan Fisher danfisher261@gmail.com wrote:
My situation: I'm on a shared server where they don't want me to go above certain CPU limits (cpu seconds/per hour). I'm not able to install Squid, APC or memcached. Lately I've been having problems with CPU usage due to traffic surges and malicious bots. I don't want to spend more money on hosting if I don't have to but that option is open if the server company thinks I should upgrade. I want to be a good client and not effect other users on the server.
Sounds like it's not a suitable host (or plan at that host) for a publicly accessible mediawiki instance.
What's your budget? there are services like linode where you can fairly cheaply get root access to your own virtual server (and therefore have no limits on memcache, APC, squid, varnish, etc.) and if you don't have the resources to manage that yourself then there are services that will host a mediawiki instance for you.
But I'm thinking if those calls to Load.php were cut down, it would make it possible for Wikipedia to use less servers and would also make everyone else's sites run faster.
For Wikimedia wikis, those load.php URLs are already served by a cluster of caching proxies and have a high hit rate and php is run only for cache misses.
I don't know offhand what the state of the file caching feature is (or whether it's actively maintained) but any improvements to it would probably be welcomed. But even if it is improved, you're probably better off using caching strategy that's already used by a significant number of existing deployments; I think it's safe to say deployments using the file caching feature are not common. (but may be wrong too! please correct me if I am)
-Jeremy
On 26/08/12 19:53, Dan Fisher wrote:
For most cases, the vast majority of a wiki's traffic is from non-logged in users. All kinds of caches should then work in a way where the page does not have to be rendered again with PHP. An example is Mediawiki's File caching system: http://www.mediawiki.org/wiki/Manual:File_cache It's commonly known that PHP files take a lot more CPU than static content and this is also the concept behind MW's File caching system. My situation: I'm on a shared server where they don't want me to go above certain CPU limits (cpu seconds/per hour). I'm not able to install Squid, APC or memcached.
They should have APC or other opcode engine installed by themselves.
Lately I've been having problems with CPU usage due to traffic surges and malicious bots. I don't want to spend more money on hosting if I don't have to but that option is open if the server company thinks I should upgrade. I want to be a good client and not effect other users on the server.
Here's a problem I see with MW's File caching system. It still processes PHP files, e.g. here's some actual lines of code from my wiki's page when it loads a page from the File caching system.
Yes. Also note that MediaWiki "File caching" still runs a layer of php before showing what was cached there. But you need to note that there are very different php hits. Rendering a moderately complex article will be much more expensive than a load.php hit.
Wikipedia also loads these PHP files, thus increasing CPU usage:
<link rel="stylesheet" href="http://mywikisite.com/w/*load.php*?debug=false&lang=en&modules=site&only=styles&skin=vector&*" /> <link rel="stylesheet" href="http://mywikisite.com/w/*load.php*?debug=false&lang=en&modules=ext.flaggedRevs.basic%7Cmediawiki.legacy.commonPrint%2Cshared%7Cskins.vector&only=styles&skin=vector&*" /> <script src="http://mywikisite.com/w/*load.php* ?debug=false&lang=en&modules=skins.vector&only=scripts&skin=vector&*"></script> <script src="http://mywikisite.com/w/*load.php* ?debug=false&lang=en&modules=site&only=scripts&skin=vector&*"></script>
These were introduced with the resourceloader. MediaWiki linked to each file before. The resourceloader serves all files in one hit, instead of forcing the client to perform several requests. It should be very lightweight, though.
To confirm this, I have seen the static HTML file generated by the cache, and these lines are present in the HTML code that is viewed from the browser or an HTML editor. So load.php is being made to run at least 4 times during each page load. It may be 3 times for my site if Flagged revisions wasnt installed, but again, Wikipedia has similiar lines of code which make multiple calls to Load.php. Yesterday I had a huge traffic spike and the server process scan confirmed that Load.php was running a lot of times. If three pages are loaded at about the same time, that means 12 calls to Load.php. I've also compared situations where I wasn't using any cache and where I was using the File cache, and I didn't see any noticeable difference in the CPU usage.
So I think MW's File caching system should be improved so that no PHP processing is required at all for non-logged users. After all, the same exact copy of the page is going to be served to non-logged in users so it makes sense to have 100% of that content static so it doesn't require any PHP processing at all. The only time PHP should run is when content changes. That should refresh the cache and regenerate the static content.
PHP would still need to run if you asked it for a non-cached page.
I know this isn't a problem for Wikipedia because they have a lot of servers and have additional great caching systems (squid, memcached, etc) so everything is fast.
It has a lot of servers, but also a gazillion of users, so Wikipedia is a customer really interested in having efficient caching. It's true that it can also add advanced caching layers, too.
But I'm thinking if those calls to Load.php were cut down, it would make it possible for Wikipedia to use less servers and would also make everyone else's sites run faster.
Have you measured that load.php is slow?
In any case, PHP processing should be used minimally, only when necessary. If the page had no calls to PHP files, it would use less CPU and again, if the same page is being served to non-logged in users, ideally there should be no or very little PHP processing.
That should be the case.
Also note, that your artificial load by bots will be very differnet than the normal usage of a user. They most likely aren't calling load.php at all (and they could perform costly requests that a well-behaved client would rarely do). What is their access pattern?
On 27/08/12 03:53, Dan Fisher wrote:
For most cases, the vast majority of a wiki's traffic is from non-logged in users. All kinds of caches should then work in a way where the page does not have to be rendered again with PHP. An example is Mediawiki's File caching system: http://www.mediawiki.org/wiki/Manual:File_cache It's commonly known that PHP files take a lot more CPU than static content and this is also the concept behind MW's File caching system. My situation: I'm on a shared server where they don't want me to go above certain CPU limits (cpu seconds/per hour). I'm not able to install Squid, APC or memcached. Lately I've been having problems with CPU usage due to traffic surges and malicious bots. I don't want to spend more money on hosting if I don't have to but that option is open if the server company thinks I should upgrade. I want to be a good client and not effect other users on the server.
When we designed ResourceLoader, we were aware that there would be increased server CPU for some shared host users, however there was no easy way to avoid it, and there were clear performance benefits to the scheme for users in less constrained server environments.
Serving JS and CSS from PHP allows MediaWiki to control the caching headers, particularly the Expires header. This gives improved performance for clients, since they no longer have to continually re-request server-side resources.
To confirm this, I have seen the static HTML file generated by the cache, and these lines are present in the HTML code that is viewed from the browser or an HTML editor. So load.php is being made to run at least 4 times during each page load.
Only with a cold client cache. Subsequent page views will not need to fetch all those load.php objects, because of the Expires header.
Some objects are served with a short (5 minute) expiry time. You may be able to reduce CPU usage slightly by increasing that expiry time, at the expense of delayed updates to JS and CSS:
$wgResourceLoaderMaxage['unversioned']['server'] = 3600; $wgResourceLoaderMaxage['unversioned']['client'] = 3600;
Any CDN should be able to read the Expires headers MediaWiki sends, and use them to offload your webserver. You could even use CoralCDN (http://coralcdn.org/), which is a free caching proxy service:
$wgLoadScript = 'http://mywikisite.com.nyud.net/w/load.php';
-- Tim Starling
mediawiki-l@lists.wikimedia.org