When I enable the file cache and request a page from Mediawiki, the browser is waiting for the request forever. On the server the page has been created in the cache. It's compressed and the content looks fine (zcat'ed). There's an Apache process continuously running at almost 100% CPU.
Without the filecache it works fine.
I noticed some discussion from early july about the status of the filecache. Should it work now in beta6?
I have enabled the file cache by adding this to LocalSettings.php:
$wgShowIPinHeader = false; $wgUseFileCache = true; $wgFileCacheDirectory = "/home/rene/projects/carriere/cache";
This directory exists and is writable for Apache.
No other tweaks, it's a clean install (well, the database is upgraded from beta5).
I'm using: MediaWiki: 1.3.0beta6 PHP: 4.1.2 (apache) MySQL: 3.23.49-log on Debian woody
Any ideas?
Rene Pijlman wrote:
When I enable the file cache and request a page from Mediawiki, the browser is waiting for the request forever. On the server the page has been created in the cache. It's compressed and the content looks fine (zcat'ed). There's an Apache process continuously running at almost 100% CPU.
Everything seems to work fine on my main test machine (Mac OS X 10.3.4, PHP 4.3.2), but I can confirm this phenomenon on Debian Woody.
(Side note: the file cache doesn't interact well with output-buffered gzipping. Comment out the line that sets that near the top of LocalSettings.php; unfortunately that doesn't solve this problem.)
The output is being written out to the cache file *and sent to the client* but the connection hangs there. I'm not sure why yet...
-- brion vibber (brion @ pobox.com)
Brion Vibber:
Rene Pijlman:
When I enable the file cache and request a page from Mediawiki, the browser is waiting for the request forever. On the server the page has been created in the cache. It's compressed and the content looks fine (zcat'ed). There's an Apache process continuously running at almost 100% CPU.
Everything seems to work fine on my main test machine (Mac OS X 10.3.4, PHP 4.3.2), but I can confirm this phenomenon on Debian Woody.
(Side note: the file cache doesn't interact well with output-buffered gzipping. Comment out the line that sets that near the top of LocalSettings.php; unfortunately that doesn't solve this problem.)
The output is being written out to the cache file *and sent to the client* but the connection hangs there. I'm not sure why yet...
I noticed that the 100% CPU occurs after index.php has finished, and after return from the output callback.
My guess is the call to header() in the output callback saveToFileCache() is not safe. This is writing to the buffer that the output callback is processing.
if( $this->useGzip() ) { if( wfClientAcceptsGzip() ) { header( 'Content-Encoding: gzip' );
Perhaps this confuses PHP. Also, I guess this header doesn't actually make it into the headers, so perhaps the gzipped data is confusing something down the line.
Rene Pijlman:
My guess is the call to header() in the output callback saveToFileCache() is not safe. This is writing to the buffer that the output callback is processing.
if( $this->useGzip() ) { if( wfClientAcceptsGzip() ) { header( 'Content-Encoding: gzip' );
BTW, this looks conceptually flawed at this point. The encoding was already decided when the data was written to the buffer. I can't think of a reason to redecide and write this header only now.
Also I wonder if it's wise to use the output callback to write the file to the cache. That could be done earlier as well, I'd think.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Brion Vibber wrote: | (Side note: the file cache doesn't interact well with output-buffered | gzipping. Comment out the line that sets that near the top of | LocalSettings.php; unfortunately that doesn't solve this problem.) | | The output is being written out to the cache file *and sent to the | client* but the connection hangs there. I'm not sure why yet...
Found the problem. It seems that the buffer is being passed by reference on PHP 4.1; the variable is modified by the function and all goes to hell. Making a copy to operate on gets things working.
Diff attached; fix just added to CVS head and 1.3 branch.
- -- brion vibber (brion @ pobox.com)
Index: includes/CacheManager.php =================================================================== RCS file: /cvsroot/wikipedia/phase3/includes/CacheManager.php,v retrieving revision 1.5.2.1 diff -u -r1.5.2.1 CacheManager.php --- includes/CacheManager.php 13 Jun 2004 01:15:09 -0000 1.5.2.1 +++ includes/CacheManager.php 8 Aug 2004 10:15:48 -0000 @@ -110,7 +110,8 @@ if(!file_exists($mydir2)) { mkdir($mydir2,0775); } } - function saveToFileCache( $text ) { + function saveToFileCache( $origtext ) { + $text = $origtext; if(strcmp($text,'') == 0) return ''; wfDebug(" saveToFileCache()\n", false);
Brion Vibber:
Found the problem. It seems that the buffer is being passed by reference on PHP 4.1; the variable is modified by the function and all goes to hell. Making a copy to operate on gets things working.
Diff attached; fix just added to CVS head and 1.3 branch.
This solves the runaway, but it's not working correctly yet.
When I request a normal article anonymously with the cache enabled, my browser shows gibberish (Firefox) or a download dialog (IE). I've looked at the headers with wget -S, and there's no:
Content-Encoding: gzip
Rene Pijlman:
Brion Vibber:
Found the problem. It seems that the buffer is being passed by reference on PHP 4.1; the variable is modified by the function and all goes to hell. Making a copy to operate on gets things working.
This solves the runaway, but it's not working correctly yet.
BTW it works fine now with
$wgUseGzip = false;
in LocalSettings.php. The generic caching problem is fixed by Brion's patch, a compressed caching problem remains.
I've looked at the headers with wget -S, and there's no:
Content-Encoding: gzip
Correcting myself...
I forgot that wget by default sends a different Accept-Encoding header. When I run it with:
wget -S --header='Accept-Encoding: gzip, deflate' url
the headers look OK:
1 HTTP/1.1 200 OK 2 Date: Sun, 08 Aug 2004 14:39:31 GMT 3 Server: Apache/1.3.26 (Unix) Debian GNU/Linux mod_python/2.7.8 Python/2.1.3 PHP/4.1.2 4 X-Powered-By: PHP/4.1.2 5 Vary: Accept-Encoding 6 Expires: -1 7 Cache-Control: private, must-revalidate, max-age=0 8 Last-modified: Sat, 7 Aug 2004 22:51:25 GMT 9 Content-Encoding: gzip 10 Content-Length: 1803 11 Keep-Alive: timeout=15, max=100 12 Connection: Keep-Alive 13 Content-Type: text/html; charset=iso-8859-1 14 Content-Language: nl
... and wget stores the gzipped data in a file (that's correct I guess). With zcat it looks fine.
The question remains: why don't Firefox and IE uncompress the data before rendering...
Rene Pijlman wrote:
This solves the runaway, but it's not working correctly yet.
When I request a normal article anonymously with the cache enabled, my browser shows gibberish (Firefox) or a download dialog (IE). I've looked at the headers with wget -S, and there's no:
Content-Encoding: gzip
Did you disable the generic gzipping at the top of LocalSettings.php like I said was necessary in my previous mail?
-- brion vibber (brion @ pobox.com)
Brion Vibber:
Rene Pijlman:
When I request a normal article anonymously with the cache enabled, my browser shows gibberish (Firefox) or a download dialog (IE)
Did you disable the generic gzipping at the top of LocalSettings.php like I said was necessary in my previous mail?
Oh waitaminute, you're referring to this I guess:
if( !ini_get( 'zlib.output_compression' ) ) ob_start( 'ob_gzhandler' );
I completely overlooked that. Indeed, when I remove this line and configure file caching like this:
$wgUseFileCache = true; $wgFileCacheDirectory = "/home/rene/projects/carriere/cache"; $wgShowIPinHeader = false; $wgUseGzip = true;
... it works fine. Files in the cache are compressed and both Firefox and IE render pages correctly.
Thanks again for your help Brion.
If I'm not mistaken the cause of the problem was that both ob_gzhandler and the file cache were compressing the output, so it was compressed two times, which would explain the problem I saw.
May I suggest to fix this in the code, to make configuration easier?
I'd say that when PHP provides the mechanism for output compression over the wire, there's no reason to duplicate this mechanism in the file cache. You might as well completely remove compression from the file cache and cache uncompressed on disk. At the cost of some extra disk space this will improve performance of the file cache (no CPU cycles for (un)compression) and simplify the code. This will neatly separate caching from compression.
Would it help if I implement and test this and submit a patch?
Rene Pijlman wrote:
I'd say that when PHP provides the mechanism for output compression over the wire, there's no reason to duplicate this mechanism in the file cache. You might as well completely remove compression from the file cache and cache uncompressed on disk. At the cost of some extra disk space this will improve performance of the file cache (no CPU cycles for (un)compression) and simplify the code. This will neatly separate caching from compression.
The point of compressing the file cache is of course to save disk space. Since this is a legitimate thing to do, I don't think there's a need to remove that option.
-- brion vibber (brion @ pobox.com)
Brion Vibber:
The point of compressing the file cache is of course to save disk space. Since this is a legitimate thing to do, I don't think there's a need to remove that option.
Hmm, I assumed this was intended to compress the data that is sent to the client. Well OK, in that case I suggest to make the file cache work correctly in all cases, e.g. by sending uncompressed data to the output buffer when the PHP lib will take care of compression.
The wfClientAcceptsGzip() branches of the code seem to duplicate the functionality of ob_gzhandler, and can be removed without loss of functionality.
An optimization would be to solve it the other way around: decide per request to not compress with ob_gzhandler when the file is available in compressed form in the file cache.
Rene Pijlman wrote:
Brion Vibber:
The point of compressing the file cache is of course to save disk space. Since this is a legitimate thing to do, I don't think there's a need to remove that option.
Hmm, I assumed this was intended to compress the data that is sent to the client.
That's just a nifty side effect. :)
An optimization would be to solve it the other way around: decide per request to not compress with ob_gzhandler when the file is available in compressed form in the file cache.
ob_end_clean() would probably disable the handler correctly... you might try slipping a call in and see if that does it. The comments in the manual indicate that it may still _call_ the ob_gzhandler function (so will modify the headers) but won't output any data.
The main potential problem with this would be that ob_gzhandler might have a different idea of what accepts gzip than we do.
-- brion vibber (brion @ pobox.com)
Brion Vibber:
Rene Pijlman:
An optimization would be to solve it the other way around: decide per request to not compress with ob_gzhandler when the file is available in compressed form in the file cache.
ob_end_clean() would probably disable the handler correctly... you might try slipping a call in and see if that does it. The comments in the manual indicate that it may still _call_ the ob_gzhandler function (so will modify the headers) but won't output any data.
The main potential problem with this would be that ob_gzhandler might have a different idea of what accepts gzip than we do.
Another problem is zlib.output_compression which is set in php.ini and cannot be turned off at the scripting level (according to a comment in the docs). And I guess most users will want to enable it to compress all requests, including requests that cannot be served from the file cache. So I don't think an implementation which serves compressed cached files without the overhead of uncompress/compress is worth the effort.
I have attached a patch which fixes the compressed file cache, by always sending uncompressed data to the output buffer, leaving compression up to zlib.output_compression or ob_gzhandler.
The effect of the patch is that the file cache can be enabled with $wgUseGzip set to true or false. $wgUseGzip decides if the files stored in the cache are compressed or not.
But I think there should be a comment in the documentation that setting $wgUseGzip to true makes little sense. The idead of the cache is to spend some disk space to reduce CPU-cycles, so why would you then want to spend CPU-cycles to reduce that disk space?
I also suggest to change this in DefaultSettings.php:
# We can serve pages compressed in order to save bandwidth, # but this will increase CPU usage. # Requires zlib support enabled in PHP. $wgUseGzip = function_exists( 'gzencode' );
to:
# Should the file cache be compressed, in order to save disk # space. This will increase CPU usage. # Requires zlib support enabled in PHP. To enable, change # this line to: # $wgUseGzip = function_exists('gzencode'); $wgUseGzip = false;
With this patch applied, wfClientAcceptsGzip() will no longer be used.
mediawiki-l@lists.wikimedia.org