Hi again,
I attempted to submit a sample Wkipedia page to the W3C validator and was amzaed to see that carried no character encoding whatsoever. Here is the full report from the validator -
I was not able to extract a character encoding labeling from any of the valid sources for such information. Without encoding information it is impossible to validate the document. The sources I tried are: The HTTP Content-Type field. The XML Declaration. The HTML "META" element.
And I even tried to autodetect it using the algorithm defined in Appendix F of the XML 1.0 Recommendation. Since none of these sources yielded any usable information, I will not be able to validate this document. Sorry. Please make sure you specify the character encoding in use. IANA maintains the list of official names for character sets. ---- <end quote> Surely Wikipedia ought to use an encoding such as UTF-8?
On lun, 2002-12-30 at 11:06, Richard Grevers wrote:
Hi again,
I attempted to submit a sample Wkipedia page to the W3C validator and was amzaed to see that carried no character encoding whatsoever.
Eh?
GET /wiki/foobar HTTP/1.0 Host: www.wikipedia.org
HTTP/1.1 200 OK Date: Mon, 30 Dec 2002 19:38:38 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.3 X-Powered-By: PHP/4.2.3 Set-Cookie: PHPSESSID=19b987cd1b0790c6b079be08069f63f0; path=/ Expires: 0 Cache-Control: no-cache Last-Modified: Mon, 30 Dec 2002 19:38:44 GMT Pragma: no-cache Content-language: en Connection: close Content-Type: text/html; charset=iso-8859-1
Right there, plain as day.
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.wikipedia.org%2Fwiki%2Ffo... makes no complaints, either.
-- brion vibber (brion @ pobox.com)
On 30 Dec 2002 11:47:28 -0800, Brion Vibber brion@pobox.com wrote:
On lun, 2002-12-30 at 11:06, Richard Grevers wrote:
Hi again,
I attempted to submit a sample Wkipedia page to the W3C validator and was amzaed to see that carried no character encoding whatsoever.
Eh?
GET /wiki/foobar HTTP/1.0 Host: www.wikipedia.org
HTTP/1.1 200 OK Date: Mon, 30 Dec 2002 19:38:38 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.3 X-Powered-By: PHP/4.2.3 Set-Cookie: PHPSESSID=19b987cd1b0790c6b079be08069f63f0; path=/ Expires: 0 Cache-Control: no-cache Last-Modified: Mon, 30 Dec 2002 19:38:44 GMT Pragma: no-cache Content-language: en Connection: close Content-Type: text/html; charset=iso-8859-1
Right there, plain as day.
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.wikipedia.org%2Fwiki%2Ffo...
makes no complaints, either.
Aha! One of the rare sites which uses the HTTP header and only the HTTP header to declare encoding. This tricked Opera's built-in validation submission, which actually uploads it's cached copy of the page, hence no header. Sorry for the false alarm.
On lun, 2002-12-30 at 12:20, Richard Grevers wrote:
Aha! One of the rare sites which uses the HTTP header and only the HTTP header to declare encoding. This tricked Opera's built-in validation submission, which actually uploads it's cached copy of the page, hence no header. Sorry for the false alarm.
Silly me for trying to follow standards. ;)
It would be safer to add in a meta tag too I suppose, I'll do that...
-- brion vibber (brion @ pobox.com)
On 30 Dec 2002 12:31:35 -0800, Brion Vibber wrote:
On lun, 2002-12-30 at 12:20, Richard Grevers wrote:
Aha! One of the rare sites which uses the HTTP header and only the HTTP header to declare encoding. This tricked Opera's built-in validation submission, which actually uploads it's cached copy of the page, hence no header. Sorry for the false alarm.
Silly me for trying to follow standards. ;)
It would be safer to add in a meta tag too I suppose, I'll do that...
While you're about it, is there a concensus on the best encoding to use on general pages? 8859-15 ensures no problems should people use the likes of a Euro symbol in an article. Most definitions I've seen describe 8859-15 as "intended to replace 8859-1".
-- A lottery is just a tax on people who are bad at math
On Tue, Dec 31, 2002 at 10:11:29AM +1300, Richard Grevers wrote:
It would be safer to add in a meta tag too I suppose, I'll do that...
While you're about it, is there a concensus on the best encoding to use on general pages? 8859-15 ensures no problems should people use the likes of a Euro symbol in an article. Most definitions I've seen describe 8859-15 as "intended to replace 8859-1".
I assumed that the encoding in the meta tag would be exactly the same as the encoding in the http headers. Any objection to that? I think switching from 8859-1 to 8859-15 is a separate issue, worth discussing. For instance, how much software supports 8859-15?
Jonathan
On Mon, 30 Dec 2002 13:22:48 -0800, Jonathan Walther wrote:
On Tue, Dec 31, 2002 at 10:11:29AM +1300, Richard Grevers wrote:
It would be safer to add in a meta tag too I suppose, I'll do that...
While you're about it, is there a concensus on the best encoding to use on
general
pages? 8859-15 ensures no problems should people use the likes of a Euro symbol
in
an article. Most definitions I've seen describe 8859-15 as "intended to replace 8859-1".
I assumed that the encoding in the meta tag would be exactly the same as the encoding in the http headers. Any objection to that? I think switching from 8859-1 to 8859-15 is a separate issue, worth discussing. For instance, how much software supports 8859-15?
Or to rephrase, is there any that doesn't? I've sued it on all my pages for a year now, and I can't recall any browser that fails to render because of the encoding. I must try the shopping basket I've just written (option for Euros) in Lynx.
-- A lottery is just a tax on people who are bad at math
On lun, 2002-12-30 at 13:11, Richard Grevers wrote:
While you're about it, is there a concensus on the best encoding to use on general pages? 8859-15 ensures no problems should people use the likes of a Euro symbol in an article. Most definitions I've seen describe 8859-15 as "intended to replace 8859-1".
ISO 8859-15 does not cover a number of characters that ISO 8859-1 does, so I would heartily not recommend it.
Once conversion support for older, Unicode-broken browsers (wouldn't trust them to do ISO 8859-15 either) is set up, the remaining Latin-1 sections of Wikipedia will move to UTF-8 (as Meta-Wikipedia, Wiktionary, and many of the language sections of Wikipedia are already) so that links and text are all in a compatible format.
-- brion vibber (brion @ pobox.com)
wikipedia-l@lists.wikimedia.org