My test box at home is running Apache 2.0.48 on FreeBSD 5.2rc2, with PHP 4.3.4 installed as an apache filter. For the most part it works fine, but I've noticed an oddity on this system that I haven't seen on the production boxes (always running Apache 1.3.x): with file cache and gzip compression on, the Content-Encoding header is missing on the first send of a newly cached page, so you see raw binary gibberish.
As a workaround I had dropped an extra header() call into Article::tryFileCache() but I've disabled this on pliny (which is not affected by the original bug) due to a secondary bug it causes with very long page titles (#870290). This could probably be worked around again, but I'd rather solve the initial issue. There may be other related problems with missing headers.
Is anyone else testing with Apache 2 who can confirm this problem (particularly on Linux)? Enable $wgUseGzip and $wgUseFileCache, disable $wgShowIPinHeader, and set $wgFileCacheDirectory to an apache-writable directory, and comment out the header() call in Article::tryFileCache().
We ought to make sure it works, and also see if the dreaded Ampersand-in-Path-Rewritten-to-Query-String issue can be dealt with reasonably (ie, without patching Apache as we do for 1.3; over time I've received several requests for the patch from people who had the same problem and stumbled on my newsgroup posting about it.)
-- brion vibber (brion @ pobox.com)
On Jan 4, 2004, at 04:03, Brion Vibber wrote:
Is anyone else testing with Apache 2 who can confirm this problem (particularly on Linux)?
I've duplicated the problem on a machine running Fedora Core 1, which has Apache 2.0.47 installed.
In addition to the content-encoding problem, I've noticed that the content-type header lists the wrong charset encoding when (otherwise correctly) showing cached pages, compressed or not. Redhat/Fedora uses UTF-8 as the default locale charset, and apparently inserts this into the default content-type header; in at least some browsers this overrides the meta tag in the HTML which says it's ISO 8859-1. Viewing non-cached pages (logged in, or diff views etc) the correct encoding comes through.
This looks like a related problem to the gzip headers; something involved in the cache system stops headers from getting sent through after a certain point.
On both the FreeBSD and Linux systems the apache installations were stock (from ports and from OS-provided rpms), PHP from source configured so:
./configure --enable-shmop --with-zlib --with-mysql --with-iconv --with-apxs2filter=/usr/sbin/apxs --with-readline --enable-sockets'
(on FreeBSD /usr/local/sbin/apxs and also --with-tsrm-pth)
Dropped into /usr/local/lib/php.ini: register_globals = On
Dropped on the end of httpd.conf: AddType application/x-httpd-php .php .phtml AddType application/x-httpd-php-source .phps DirectoryIndex index.html index.php
-- brion vibber (brion @ pobox.com)
On Sun, Jan 04, 2004 at 05:36:23AM -0800, Brion Vibber wrote:
On Jan 4, 2004, at 04:03, Brion Vibber wrote:
Is anyone else testing with Apache 2 who can confirm this problem (particularly on Linux)?
I've duplicated the problem on a machine running Fedora Core 1, which has Apache 2.0.47 installed.
In addition to the content-encoding problem, I've noticed that the content-type header lists the wrong charset encoding when (otherwise correctly) showing cached pages, compressed or not. Redhat/Fedora uses UTF-8 as the default locale charset, and apparently inserts this into the default content-type header; in at least some browsers this overrides the meta tag in the HTML which says it's ISO 8859-1. Viewing
Apache 2 has some uber-paranoid security "features"; this is one of them. For example, you have to manually enable public_html for user home pages because the alternative would allow an attacker to find if a given user account exists or not, which is considered a security vulnerability. Apparently, pages without charsets have some Cross Site Scripting issues, so rather than trusting authors to use meta tags, apache decided that all pages should be ISO-8859-1, knowing perfectly well that in both MSIE and Gecko it overrides the meta tag. So if you want the encoding to be UTF-8 apache expects you to give all your pages a ".utf8" extension! No really.
More info at http://lists.w3.org/Archives/Public/www-tag/2003Sep/0176.html
Fedora made it worse by changing ISO-8859-1 to UTF-8, because "all pages will eventually be Unicode encoded", so if yours aren't yet you have to be thankful for being forced to fix them I guess.
Commenting out the line "AddDefaultCharset UTF-8" in httpd.conf should fix this.
Arvind
Another idea: something about these settings in php.ini:
default_mimetype = "text/html" ;default_charset = "iso-8859-1"
Gabriel Wicke
On Sun, 04 Jan 2004 04:03:43 -0800, Brion Vibber wrote:
My test box at home is running Apache 2.0.48 on FreeBSD 5.2rc2, with PHP 4.3.4 installed as an apache filter. For the most part it works fine, but I've noticed an oddity on this system that I haven't seen on the production boxes (always running Apache 1.3.x): with file cache and gzip compression on, the Content-Encoding header is missing on the first send of a newly cached page, so you see raw binary gibberish.
I have no experience with Apache2, so i'm just guessing here. My understanding is that both the file cache and gzip are done in php. Are the original headers stored along with the cached page so that they are used on a cache hit? I have a php cache system in use in the CMS Ariadne, there we use separate dirs for compressed content, uncompressed content, headers. If the headers are cached, my only guess is some whitespace (eg trailing newline) before the cached headers are sent out.
Gabriel Wicke
On Jan 4, 2004, at 05:57, Gabriel Wicke wrote:
I have no experience with Apache2, so i'm just guessing here. My understanding is that both the file cache and gzip are done in php. Are the original headers stored along with the cached page so that they are used on a cache hit?
The script sends out both the headers and the content. The content is taken from a cached file if possible.
I should stress again: the headers are *fine* when the already cached file is sent out or when an uncached page is rendered and sent out. The headers are *wrong* *with Apache 2 only* when we first *make* the cached file and send it out.
Changing the default charset or encoding is irrelevant, since we override them and that override is what should be visible. Getting UTF-8 when you expected ISO 8859-1 (or vice versa!) or uncompressed when you expected gzip is a symptom, not the disease.
-- brion vibber (brion @ pobox.com)
Okay, I've created a minimal test case for the problem:
<?php function showCompressed( $phantom = "" ) { $compressed = "\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03\xf3\xc8\x54\x54\xf0\x54" . "\xcf\x55\x48\x54\x48\xce\xcf\x2d\x28\x4a\x2d\x2e\x4e\x4d\x51\x28" . "\x2e\x29\xca\xcc\x4b\x07\x00\x46\xa1\x81\x5a\x1b\x00\x00\x00"; header( "Content-encoding: gzip" ); return $compressed; } ob_start( "showCompressed" ); print "This is throwaway text."; flush(); ?>
On the Apache 1.3.x setups I've tested this on, it works fine: a browser recognizes the encoding and prints out "Hi! I'm a compressed string". On my Apache 2.0.x setups, the content-encoding header is not sent and the browser prints out binary gibberish.
If I remove the flush() call or change it to ob_flush(), then the headers are sent out correctly and the browser decodes the string.
The purpose of the flush() (which is at the end of OutputPage::output()) is to ensure that actual page output is completed before the 'deferred updates' run, so the user doesn't have to wait for them to complete before they can start reading. This isn't really vital and could probably be accomplished by using ob_start()/ob_flush() anyway.
-- brion vibber (brion @ pobox.com)
On Jan 4, 2004, at 23:44, Brion Vibber wrote:
If I remove the flush() call or change it to ob_flush(), then the headers are sent out correctly and the browser decodes the string.
Ah ho: http://bugs.php.net/bug.php?id=25701
After some further testing: if PHP is configured --with-apxs2 instead of --with-apxs2filter, it is slightly better at reporting the condition of being unable to send more headers, but that doesn't really help any.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org