I am forwarding your request to wikitech-l, in the hope that there are more people on there who can comment on this issue.
For those who did not follow the entire thread: the user does not send an Accept-Encoding: gzip header, but nevertheless gets a gzipped response.
On Thu, Nov 25, 2010 at 8:19 PM, Anand Ramanathan rcanand@gmail.com wrote:
Bryan: No, I didnt set the Accept-Encoding header explicitly - I found the following related issue on bugzilla: 7098
Andrew: Yes, thanks. I see that curl can support this, and so can open-uri.
I wanted to clarify if I should be handling this in the client: As per http 1.1 (section 14.3), for non-browser user agents, if no Accept-Encoding is explicitly set, the response should be the document itself if the server supports returning the document itself (identity). However, if the server is unable to return the document itself, it is preferable to return gzip or compressed content. I think this issue is happening whenever I hit a cache node that has the gzip, but not the identity cached. From a server standpoint, it seems like the right behavior. So, it is up to the client, which needs to do one of the following: a) Set Accept-Encoding to make gzip not-acceptable, and identity as acceptable. In this case, a cache node containing only gzip encoded document will miss, and eventually a node that contains the identity will return it. (This is a leap of faith, as I cannot target such a cache node explicitly. If a node has both gzip and identity content, and is responding with gzip for a request with no explicit Accept-Encoding set, then it violates the spec and is a bug. Can anyone comment on this?) b) Set Accept-Encoding to accept gzip or identity (or leave it unset), and on the client, if Content-Encoding is gzip, unzip it explicitly. I am fine with either of these approaches. Is this an accurate assessment of the issue and options? Thanks Anand
On Thu, Nov 25, 2010 at 4:23 AM, Andrew Dunbar hippytrail@gmail.com wrote:
On 25 November 2010 19:41, Anand Ramanathan rcanand@gmail.com wrote:
Yes, confirmed that they are. It is gzip - what is the best way to deal with this? Is this a bug that is tracked, or is this something worth handling in client code (checking if gzip and manually unzipping)? Thanks Anand
Curl can definitely handle gzipped responses. Here's something about it from a very quick Google search: http://curl.haxx.se/mail/curlphp-2004-01/0043.html
Andrew Dunbar (hippietrail)
On Thu, Nov 25, 2010 at 12:12 AM, Bryan Tong Minh bryan.tongminh@gmail.com wrote:
On Thu, Nov 25, 2010 at 9:02 AM, Anand Ramanathan rcanand@gmail.com wrote:
OK, I got it again: Here is my curl output (headers + first few characters) for the garbled India wikipedia page (and the proper China wikipedia page for comparison below that):
Can you verify that the first two characters are 0x1f and 0x8b respectively? Looks like gzip.
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api