I am forwarding your request to wikitech-l, in the hope that there are
more people on there who can comment on this issue.
For those who did not follow the entire thread: the user does not send
an Accept-Encoding: gzip header, but nevertheless gets a gzipped
response.
On Thu, Nov 25, 2010 at 8:19 PM, Anand Ramanathan <rcanand(a)gmail.com> wrote:
Bryan: No, I didnt set the Accept-Encoding header
explicitly - I found the
following related issue on bugzilla: 7098
Andrew: Yes, thanks. I see that curl can support this, and so can open-uri.
I wanted to clarify if I should be handling this in the client:
As per http 1.1 (section 14.3), for non-browser user agents, if no
Accept-Encoding is explicitly set, the response should be the document
itself if the server supports returning the document itself (identity).
However, if the server is unable to return the document itself, it is
preferable to return gzip or compressed content.
I think this issue is happening whenever I hit a cache node that has the
gzip, but not the identity cached. From a server standpoint, it seems like
the right behavior. So, it is up to the client, which needs to do one of the
following:
a) Set Accept-Encoding to make gzip not-acceptable, and identity as
acceptable. In this case, a cache node containing only gzip encoded document
will miss, and eventually a node that contains the identity will return it.
(This is a leap of faith, as I cannot target such a cache node explicitly.
If a node has both gzip and identity content, and is responding with gzip
for a request with no explicit Accept-Encoding set, then it violates the
spec and is a bug. Can anyone comment on this?)
b) Set Accept-Encoding to accept gzip or identity (or leave it unset), and
on the client, if Content-Encoding is gzip, unzip it explicitly.
I am fine with either of these approaches. Is this an accurate assessment of
the issue and options?
Thanks
Anand
On Thu, Nov 25, 2010 at 4:23 AM, Andrew Dunbar <hippytrail(a)gmail.com> wrote:
On 25 November 2010 19:41, Anand Ramanathan <rcanand(a)gmail.com> wrote:
Yes, confirmed that they are. It is gzip - what
is the best way to deal
with
this? Is this a bug that is tracked, or is this something worth handling
in
client code (checking if gzip and manually unzipping)?
Thanks
Anand
Curl can definitely handle gzipped responses. Here's something about
it from a very quick Google search:
http://curl.haxx.se/mail/curlphp-2004-01/0043.html
Andrew Dunbar (hippietrail)
On Thu, Nov 25, 2010 at 12:12 AM, Bryan Tong
Minh
<bryan.tongminh(a)gmail.com>
wrote:
On Thu, Nov 25, 2010 at 9:02 AM, Anand Ramanathan <rcanand(a)gmail.com>
wrote:
> OK, I got it again: Here is my curl output (headers + first few
> characters)
> for the garbled India wikipedia page (and the proper China wikipedia
> page
> for comparison below that):
Can you verify that the first two characters are 0x1f and 0x8b
respectively? Looks like gzip.
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api