OK, I got it again: Here is my curl output (headers + first few characters)
for the garbled India wikipedia page (and the proper China wikipedia page
for comparison below that):
*$ curl -i
http://en.wikipedia.org/wiki/India*
HTTP/1.0 200 OK
Date: Thu, 25 Nov 2010 07:40:49 GMT
Server: Apache
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Content-Language: en
Vary: Accept-Encoding,Cookie
Last-Modified: Wed, 24 Nov 2010 21:34:20 GMT
Content-Encoding: gzip
Content-Length: 115091
Content-Type: text/html; charset=UTF-8
X-Cache: HIT from
sq66.wikimedia.org
X-Cache-Lookup: HIT from sq66.wikimedia.org:3128
Age: 1125
X-Cache: HIT from
sq66.wikimedia.org
X-Cache-Lookup: HIT from sq66.wikimedia.org:80
Connection: close
??َ$K? ?=W~??ߊ??n涸??????X?????ՙ5
WS???=?V? ??"????M
9???EDTTU???#"#?ع??????r????O???7'?q?ę???9????'?Vgw???????????y~???????T?Y??I,???'????Y?/???^__w???$??}?}??e???v???r?S???yg-???-'???Vo9~?>܊????0??E???8????"??;N>
???
?,'QB????????<ȅ????(«?['I?q?9_.?-g?O???}???x?Lf"͂??w?O;?[??^??eX??d??n,??íi?ȓ?h?
Nj???_?/???(?/?4?n????'?d?
?-g?z5??{???ִ????>?Y??"w?w/???4D:?m99L??N??.?x̵~?M?p????qGe??F?t?[?pK?s7??տ;I?4???ڬ'i??
&?|??M????v??N&v???nc-?<Hc??????y???C?s?1?0???8????g?`?????????????|?_|??7V?'?A}0?;a܁S?P??????Y?H?Ç`??d}????fpL?3}????a??3?6??\?B?\&??w??Q?/_Ā?
?k8?qÚ?ØF???ID?'??w?(?t?)BX?????6:?????
?~??@??l:?Y?p)?ń>?߾3x?fs?^L?????ý^??????T\?c??'?????5?5?v?$??؟??9?1?c???|O??S???IE@!<????}?i?f?ĕ?R
??D? ?|??8`k?'4΅?gk!Zx?z?]??I?L?HRo??????|????)+/̺0???d1
??9?8???86_?N?ȿ??+???????Y\?'?Yv?j?q???(8????v??O5g?"??(z????-E????QзiS?,{?W??n??*H[
4??D?I
?G?(?l!&???
*$ curl -i
http://en.wikipedia.org/wiki/China*
HTTP/1.0 200 OK
Date: Thu, 25 Nov 2010 07:40:59 GMT
Server: Apache
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Content-Language: en
Vary: Accept-Encoding,Cookie
Last-Modified: Tue, 23 Nov 2010 05:09:28 GMT
Content-Length: 254063
Content-Type: text/html; charset=UTF-8
Age: 1230
X-Cache: HIT from
sq71.wikimedia.org
X-Cache-Lookup: HIT from sq71.wikimedia.org:3128
X-Cache: MISS from
sq40.wikimedia.org
X-Cache-Lookup: MISS from sq40.wikimedia.org:80
Connection: close
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html
xmlns="http://www.w3.org/1999/xhtml" lang="en"
dir="ltr">
<head>
<title>China - Wikipedia, the free encyclopedia</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"
/>
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="MediaWiki 1.16wmf4" />
<link rel="apple-touch-icon" href="
http://en.wikipedia.org/apple-touch-icon.png" />
<link rel="shortcut icon" href="/favicon.ico" />
On Tue, Nov 23, 2010 at 12:27 AM, Anand C Ramanathan <rcanand(a)gmail.com>wrote;wrote:
Ok, will do. Thanks!
Sent from my iPad
On Nov 23, 2010, at 12:14 AM, Andrew Dunbar <hippytrail(a)gmail.com> wrote:
On 23 November 2010 10:08, Anand Ramanathan
<rcanand(a)gmail.com> wrote:
> forgot to mention: versions : ruby 1.8.7, rails 3.0 - but within the
same
environment, it is intermittent... currently have been getting 'clean'
content for last 12 hours or so, but had bad content before...
Try getting it with the full HTTP headers too. Keep a copy of them
from when it works to compare against when it doesn't work.
Andrew Dunbar (hippietrail)
> On Tue, Nov 23, 2010 at 12:06 AM, Anand Ramanathan <rcanand(a)gmail.com>
> wrote:
>>
>> I tried curl from terminal on Mac, also tried open-uri from a ruby on
>> rails app.
>>
>> On Mon, Nov 22, 2010 at 11:16 PM, Andrew Dunbar <hippytrail(a)gmail.com>
>> wrote:
>>>
>>> On 23 November 2010 03:07, Anand Ramanathan <rcanand(a)gmail.com>
wrote:
>>>> It is intermittent - sometimes I
see english characters, as expected.
>>>> Sometimes, i see some junk characters - right now getting normal
>>>> english
>>>> characters, so cant show a sample of the junk. But this is definitely
>>>> happening with this page - over the last 3 days off and on...
>>>> Will send the garbled example if and when it happens again.
>>>
>>> You might be getting gzipped content sometimes. I had this issue with
>>> certain Perl libraries. Normally you should get the content unzipped.
>>> What programming language/framework etc have you been using?
>>>
>>> Andrew Dunbar (hippietrail)
>>>
>>>> Thanks
>>>> Anand
>>>>
>>>> On Mon, Nov 22, 2010 at 3:26 PM, Betacommand
<Betacommand(a)gmail.com>
>>>> wrote:
>>>>>
>>>>> Can you define "garbled"
>>>>>
>>>>> On Mon, Nov 22, 2010 at 5:17 PM, Anand Ramanathan <
rcanand(a)gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>> I am not sure this is the right place for this issue, so please
>>>>>> redirect
>>>>>> me if there is a better forum.
>>>>>> I have been loading the wikipedia page for India programmatically
as
>>>>>> part
>>>>>> of testing my project. It was working fine, until recently. Now,
most
>>>>>> of the
>>>>>> time (but not always), loading it gives me garbled content.
>>>>>> Other pages on wikipedia work fine both programmatically and from
the
>>>>>> browser.
>>>>>> This particular page also loads fine from my browser (safari on
mac)
>>>> -
>>>> but when I load it programmatically (using curl or open-uri from
>>>> mac-os
>>>> x), the content is garbled.
>>>> I am using a user agent, but don't know what changed.
>>>> I tried closing the bar on top which appeared recently (Jimmy Wales
>>>> message), but even without that bar, it seems to be garbled.
>>>> Any suggestions?
>>>> Thanks
>>>> Anand
>>>> _______________________________________________
>>>> Mediawiki-api mailing list
>>>> Mediawiki-api(a)lists.wikimedia.org
>>>>
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>>
>>>
>>>
>>> _______________________________________________
>>> Mediawiki-api mailing list
>>> Mediawiki-api(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>>
>>
>>
>> _______________________________________________
>> Mediawiki-api mailing list
>> Mediawiki-api(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>>
>>
>
> _______________________________________________
> Mediawiki-api mailing list
> Mediawiki-api(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api