On 26/08/12 19:53, Dan Fisher wrote:
For most cases, the vast majority of a wiki's
traffic is from non-logged in
users. All kinds of caches should then work in a way where the page does
not have to be rendered again with PHP. An example is Mediawiki's File
caching system:
http://www.mediawiki.org/wiki/Manual:File_cache
It's commonly known that PHP files take a lot more CPU than static content
and this is also the concept behind MW's File caching system.
My situation: I'm on a shared server where they don't want me to go above
certain CPU limits (cpu seconds/per hour). I'm not able to install Squid,
APC or memcached.
They should have APC or other opcode engine installed by themselves.
Lately I've been having problems with CPU usage
due to
traffic surges and malicious bots. I don't want to spend more money on
hosting if I don't have to but that option is open if the server company
thinks I should upgrade. I want to be a good client and not effect other
users on the server.
Here's a problem I see with MW's File caching system. It still processes
PHP files, e.g. here's some actual lines of code from my wiki's page when
it loads a page from the File caching system.
Yes. Also note that MediaWiki "File caching" still runs a layer of php
before showing what was cached there.
But you need to note that there are very different php hits. Rendering a
moderately complex article will be much more expensive than a load.php hit.
Wikipedia also loads these
PHP files, thus increasing CPU usage:
------------
<link rel="stylesheet"
href="http://mywikisite.com/w/*load.php*?debug=false&lang=en&a…
/>
<link rel="stylesheet"
href="http://mywikisite.com/w/*load.php*?debug=false&lang=en&a…
/>
<script
src="http://mywikisite.com/w/*load.php*
?debug=false&lang=en&modules=skins.vector&only=scripts&skin=vector&*"></script>
<script
src="http://mywikisite.com/w/*load.php*
?debug=false&lang=en&modules=site&only=scripts&skin=vector&*"></script>
These were introduced with the resourceloader. MediaWiki linked to each
file before. The resourceloader serves all files in one hit, instead of
forcing the client to perform several requests. It should be very
lightweight, though.
------------
To confirm this, I have seen the static HTML file generated by the cache,
and these lines are present in the HTML code that is viewed from the
browser or an HTML editor. So load.php is being made to run at least 4
times during each page load. It may be 3 times for my site if Flagged
revisions wasnt installed, but again, Wikipedia has similiar lines of code
which make multiple calls to Load.php. Yesterday I had a huge traffic spike
and the server process scan confirmed that Load.php was running a lot of
times. If three pages are loaded at about the same time, that means 12
calls to Load.php.
I've also compared situations where I wasn't using any cache and where I
was using the File cache, and I didn't see any noticeable difference in the
CPU usage.
So I think MW's File caching system should be improved so that no PHP
processing is required at all for non-logged users. After all, the same
exact copy of the page is going to be served to non-logged in users so it
makes sense to have 100% of that content static so it doesn't require any
PHP processing at all. The only time PHP should run is when content
changes. That should refresh the cache and regenerate the static content.
PHP would still need to run if you asked it for a non-cached page.
I know this isn't a problem for Wikipedia because
they have a lot of
servers and have additional great caching systems (squid, memcached, etc)
so everything is fast.
It has a lot of servers, but also a gazillion of users, so Wikipedia is
a customer really interested in having efficient caching. It's true that
it can also add advanced caching layers, too.
But I'm thinking if those calls to Load.php were
cut
down, it would make it possible for Wikipedia to use less servers and would
also make everyone else's sites run faster.
Have you measured that load.php is slow?
In any case, PHP processing should be used minimally,
only when necessary.
If the page had no calls to PHP files, it would use less CPU and again, if
the same page is being served to non-logged in users, ideally there should
be no or very little PHP processing.
That should be the case.
Also note, that your artificial load by bots will be very differnet than
the normal usage of a user. They most likely aren't calling load.php at
all (and they could perform costly requests that a well-behaved client
would rarely do). What is their access pattern?