Why not listen to W3C ? Invalid HTML on www.wikipedia.org !

List overview All Threads
Download

newer

older

Architecture very efficient caching

Re: [Wikitech-l] Goodbye to HTML

Pieter Suurmond

21 Jan 2003 21 Jan '03

1:22 a.m.

Wikipages generated by the server do not follow W3C recommmendations. Now that we eventually have an open standard for HTML, why not use it?

One of the reasons why http://www.wikipedia.org does not validate is that the character-set is not specified, you should include a line like this:

You can easily see what other errors are made, when you type in an URI at http://validator.w3.org/. Page "http://www.wikipedia.org/" for example is not Valid HTML 4.01 Transitional! Below are the results of attempting to parse this document with an SGML parser.

1.Line 7, column 7: required attribute "TYPE" not specified.

<SCRIPT> ^

2.Line 87, column 28: start tag for "TR" omitted, but its declaration does not permit this.

<th colspan="2" align=center><big> Selected Articles </big></th></tr> ^

Kins regards, Pieter Suurmond

Show replies by date

Brion Vibber

21 Jan 21 Jan

2:31 a.m.

New subject: Why not listen to W3C ? Invalid HTML on www.wikipedia.org !

On lun, 2003-01-20 at 16:22, Pieter Suurmond wrote:

...

Wikipages generated by the server do not follow W3C recommmendations. Now that we eventually have an open standard for HTML, why not use it?

Forgetfulness?

...

One of the reasons why http://www.wikipedia.org does not validate is that the character-set is not specified, you should include a line like this:

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">

This is set in the http headers. I assume you were validating a copy that lacks the header?

...

You can easily see what other errors are made, when you type in an URI at http://validator.w3.org/. Page "http://www.wikipedia.org/" for example is not Valid HTML 4.01 Transitional! Below are the results of attempting to parse this document with an SGML parser.

1.Line 7, column 7: required attribute "TYPE" not specified.
    <SCRIPT>
           ^

Grr... I blame Magnus. Fixed.

...

2.Line 87, column 28: start tag for "TR" omitted, but its declaration does not permit this.
    <th colspan="2" align=center><big> Selected Articles </big></th></tr>
                              ^

That's an error in the wiki page; our parser can correct some errors, but isn't smart enough to fix all of them. Fixed.

The front page now validates.

-- brion vibber (brion @ pobox.com)

Tomasz Wegrzanowski

3:33 a.m.

On Mon, Jan 20, 2003 at 05:31:02PM -0800, Brion Vibber wrote:

...

That's an error in the wiki page; our parser can correct some errors, but isn't smart enough to fix all of them. Fixed.

The front page now validates.

-- brion vibber (brion @ pobox.com)

Is it possible to run some script to check all Wiki pages ? Most of problems will be silly things like missing <tr> and it may help to fix them.

It will also help for switching to XHTML (+ MathML) some day. XHTML parsers are much less forgiving than HTML parsers.

Pieter Suurmond

4:08 a.m.

New subject: (Re: Tomasz) Running script interactively instead of 'batch-repairing'?

Or build in a wiki/html-validator in the 'interactive' user interface for authors: show autors html-errors and wiki-code-errros when they press 'preview page'. This way, writers of articles get 'W3C-educated' automatically. One could even consider rejectiing of non-W3C-valid-articles... but that is maybe too strict. Thank for promoting W3C, Tomasz :-), kind regards, Pieter Suurmond

Tomasz Wegrzanowski wrote:

...

On Mon, Jan 20, 2003 at 05:31:02PM -0800, Brion Vibber wrote:

...
That's an error in the wiki page; our parser can correct some errors, but isn't smart enough to fix all of them. Fixed.

The front page now validates.

-- brion vibber (brion @ pobox.com)

Is it possible to run some script to check all Wiki pages ? Most of problems will be silly things like missing <tr> and it may help to fix them.

It will also help for switching to XHTML (+ MathML) some day. XHTML parsers are much less forgiving than HTML parsers. _______________________________________________ Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

tarquin

noon

New subject: (Re: Tomasz) Running script interactively instead of 'batch-repairing'?

Pieter Suurmond wrote:

...

Or build in a wiki/html-validator in the 'interactive' user interface for authors: show autors html-errors

one reason why we should phase out HTML in wiki text.

...

and wiki-code-errros when they press 'preview page'. This way, writers of articles get 'W3C-educated' automatically.

Wiki-markup should not be able to produce invalid HTML -- or rather, there's not "wrong" wiki syntax -- just something the writer did not intend

...

Tomasz Wegrzanowski

2:19 p.m.

New subject: (Re: Tomasz) Running script interactively instead of 'batch-repairing'?

On Tue, Jan 21, 2003 at 11:00:28AM +0000, tarquin wrote:

...

Pieter Suurmond wrote:

...
Or build in a wiki/html-validator in the 'interactive' user interface for authors: show autors html-errors

one reason why we should phase out HTML in wiki text.

This is wrong reasoning. We should allow as much HTML we do now, but make parser understand more of it, so it can fix some mistakes.

tarquin

6:04 p.m.

New subject: Goodbye to HTML

Tomasz Wegrzanowski wrote:

...

On Tue, Jan 21, 2003 at 11:00:28AM +0000, tarquin wrote:

...
Pieter Suurmond wrote:

...
Or build in a wiki/html-validator in the 'interactive' user interface for authors: show autors html-errors

one reason why we should phase out HTML in wiki text.

This is wrong reasoning. We should allow as much HTML we do now, but make parser understand more of it, so it can fix some mistakes.

What do we use HTML for? From what I've seen: * tables -- well, most people agree that HTML tables are ugly. we just haven't yet come up with a good enough alternative. * erroneous use of <BR> -- if you need a new paragraph, make one. If you feel you need a "half paragraph-break", then rethink your writing style. * LI, UL, OL -- people who haven't read the documentation. I suppose we can keep <LI> as a synonym for * * aligning images right or left with complex DIV or TABLE constructs -- as above -- we need better syntax.

any other uses I've missed?

...

Tomasz Wegrzanowski

6:33 p.m.

New subject: Goodbye to HTML

On Tue, Jan 21, 2003 at 05:04:54PM +0000, tarquin wrote:

...

...
This is wrong reasoning. We should allow as much HTML we do now, but make parser understand more of it, so it can fix some mistakes.

What do we use HTML for? From what I've seen:

tables -- well, most people agree that HTML tables are ugly. we just

haven't yet come up with a good enough alternative.

erroneous use of <BR> -- if you need a new paragraph, make one. If you

feel you need a "half paragraph-break", then rethink your writing style.

LI, UL, OL -- people who haven't read the documentation. I suppose we

can keep <LI> as a synonym for *

aligning images right or left with complex DIV or TABLE constructs --

as above -- we need better syntax.

any other uses I've missed?

* <b> and <i> where parses breaks (happens quite often) * <center> * <tt> * <li><ul><ol> for mixed type nested lists * <br> for breaking lines (<p> is not the same as <br>) * <font> for size/color changes * probably a lot more

Andre Engels

22 Jan 22 Jan

1:19 p.m.

New subject: Goodbye to HTML

On Tue, 21 Jan 2003, tarquin wrote:

...

What do we use HTML for? From what I've seen:

tables -- well, most people agree that HTML tables are ugly. we just

haven't yet come up with a good enough alternative.

erroneous use of <BR> -- if you need a new paragraph, make one. If you

feel you need a "half paragraph-break", then rethink your writing style.

LI, UL, OL -- people who haven't read the documentation. I suppose we

can keep <LI> as a synonym for *

aligning images right or left with complex DIV or TABLE constructs --

as above -- we need better syntax.

any other uses I've missed?

Yes

* Special characters, such as é, ä, &scaron; etcetera. - I would be much in favor of keeping them. * <sup> and <sub>. There isn't wiki for them yet, but it could be defined * Special typefaces like <type> * font sizes * People using HTML for things like boldface and headers. Unnecessary, I'd say, but it happens a lot. * I myself have used <blockquote> at least once. (see [[Gall-Peters projection]]) * ... and undoubtedly there's still a number I forgot.

Andre Engels

Pieter Suurmond

1:39 p.m.

New subject: Goodbye to HTML

Dear Wiki-developers/maintainers,

Wouldn't it be the best to ban HTML altogether? It has so may disadvantages. Instead, let's complete the Wiki-syntax for all our formatting-needs...

Another thing: what is against a "half paragraph-break" ?? It just a way to subdivide a paragraph. It has been an accepted style in many Dutch texts for decades, why not allow this? I'm not saying it _has_ to be done by "<BR>", on the contrary: we should invent a proper Wiki-markup for a single line-break!

Kind regards / sorry for bothering you (I've strong opinions about this), Pieter Suurmond

Andre Engels wrote:

...

On Tue, 21 Jan 2003, tarquin wrote:

...
What do we use HTML for? From what I've seen:

tables -- well, most people agree that HTML tables are ugly. we just

haven't yet come up with a good enough alternative.

erroneous use of <BR> -- if you need a new paragraph, make one. If you

feel you need a "half paragraph-break", then rethink your writing style.

LI, UL, OL -- people who haven't read the documentation. I suppose we

can keep <LI> as a synonym for *

aligning images right or left with complex DIV or TABLE constructs --

as above -- we need better syntax.

any other uses I've missed?

Yes

Special characters, such as é, ä, &scaron; etcetera. - I would be much in favor of keeping them.

<sup> and <sub>. There isn't wiki for them yet, but it could be defined

Special typefaces like <type>

font sizes

People using HTML for things like boldface and headers. Unnecessary, I'd say, but it happens a lot.

I myself have used <blockquote> at least once. (see [[Gall-Peters projection]])

... and undoubtedly there's still a number I forgot.

Andre Engels

Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

Pieter Suurmond

1:50 p.m.

New subject: Reasoning

You want wiki(pedia) to generate valid HTML pages ?

Then there is only one answer: forbid authors to write HTML in their articles. Or at least don't call it 'html', it confuses everybody (even people on this list), instead, incorporate those tags into the Wiki-syntax and define them as Wiki-markup-codes: document them and ban all other HTML-markup-codes.

Pieter Suurmond

tarquin

2:32 p.m.

New subject: Reasoning

Pieter Suurmond wrote:

...

Or at least don't call it 'html', it confuses everybody (even people on this list), instead, incorporate those tags into the Wiki-syntax and define them as Wiki-markup-codes: document them and ban all other HTML-markup-codes.

Yup -- stop treating then as HTML to be passed "as is" to the browser -- treat them as wiki tags that must be parsed. <td>s must be closed <br> is deprecated etc

...

Pieter Suurmond

3:09 p.m.

New subject: <br>

tarquin wrote:

...

Pieter Suurmond wrote:

...
Or at least don't call it 'html', it confuses everybody (even people on this list), instead, incorporate those tags into the Wiki-syntax and define them as Wiki-markup-codes: document them and ban all other HTML-markup-codes.

Yup -- stop treating then as HTML to be passed "as is" to the browser -- treat them as wiki tags that must be parsed.

<td>s must be closed <br> is deprecated etc

...

Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

Agree! :-) However... <td>s don't necessarily have to be closed, it just depends on how one defines them. They should, indeed, be regarded as 'wiki-tags', not as 'passable html-tags'. If <br> is deprecated, what would be the way to generate single line-breaks? I like them: they provide a way to sub-structure a paragraph, and it does almost cost nohing: no special character, no special tag... Why not allow single line-break?

Thanks, Pieter

Tomasz Wegrzanowski

5:36 p.m.

New subject: <br>

On Wed, Jan 22, 2003 at 03:09:57PM +0100, Pieter Suurmond wrote:

...

Agree! :-) However... <td>s don't necessarily have to be closed, it just depends on how one defines them.

I know how to fix table markup issue and it is exactly the opposite way.

We should completely forbid </td>, </th> and </tr>. They are just noise and table is completely unambiguous without them.

This is also completely correct in HTML, but we may automatically generate closing tags if we want to generate XHTML.

This reduces weight of table markup by almost half.

(Most tables on Polish Wikipedia are already written this way)

Pieter Suurmond

6:18 p.m.

New subject: generate closing tags, also in regular HTML

Tomasz Wegrzanowski wrote:

...

On Wed, Jan 22, 2003 at 03:09:57PM +0100, Pieter Suurmond wrote:

...
Agree! :-) However... <td>s don't necessarily have to be closed, it just depends on how one defines them.

I know how to fix table markup issue and it is exactly the opposite way.

We should completely forbid </td>, </th> and </tr>. They are just noise and table is completely unambiguous without them.

Yes, it is maybe possible to forbid </td>, </th> and </tr> in Wiki-source. But please DO generate them when you generate HTML out of it: although HTML4.01 does not require end-tags, many browsers still can't do without them. As it says on http://www.wikipedia.org/wiki/Wikipedia%3AHow_does_one_edit_a_page :

If your table doesn't look right, make sure that all <tr> and <td> tags are closed with corresponding </tr> and </td> tags. Do not indent lines, and do not include empty lines within a table. Otherwise, you will get spurious space above the table or even a browser crash.

Wiki-server should be able to take care of this, I presume.... (I am an experienced CGI/C/C++ programmer but am not too familiar with python, perl, php, SQL-quering. etc... I don't know what you system- administrators all use to keep the wiki-servers up and running.....)

Kind regards (and anyhow, thanks for Wikipedia, I really love it!), Pieter Suurmond

...

This is also completely correct in HTML, but we may automatically generate closing tags if we want to generate XHTML.

This reduces weight of table markup by almost half.

(Most tables on Polish Wikipedia are already written this way) _______________________________________________ Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

Erik Moeller

6:31 p.m.

New subject: generate closing tags, also in regular HTML

On Mit, 2003-01-22 at 18:18, Pieter Suurmond wrote:

...

Wiki-server should be able to take care of this, I presume.... (I am an experienced CGI/C/C++ programmer but am not too familiar with python, perl, php, SQL-quering. etc... I don't know what you system- administrators all use to keep the wiki-servers up and running.....)

Pieter,

if you know C/C++, PHP should be a breeze to get into. We could really use experienced developers. Why not give it a try? You can find some useful info in the growing (still underdeveloped) http://meta.wikipedia.org/wiki/How_to_become_a_Wikipedia_hacker on meta.

BTW, do we want to build a public tasklist on meta where anyone can pick projects to work on? The SourceForge facilities are seriously underused, and it seems to me like a wiki is best for that kind of stuff.

Regards,

Erik - always pimpin' for new developers

-- FOKUS - Fraunhofer Insitute for Open Communication Systems Project BerliOS - http://www.berlios.de

Pieter Suurmond

6:48 p.m.

New subject: (wiki got me hooked now :-)

Erik Moeller wrote:

...

On Mit, 2003-01-22 at 18:18, Pieter Suurmond wrote:

...
Wiki-server should be able to take care of this, I presume.... (I am an experienced CGI/C/C++ programmer but am not too familiar with python, perl, php, SQL-quering. etc... I don't know what you system- administrators all use to keep the wiki-servers up and running.....)

Pieter,

if you know C/C++, PHP should be a breeze to get into. We could really use experienced developers. Why not give it a try? You can find some useful info in the growing (still underdeveloped) http://meta.wikipedia.org/wiki/How_to_become_a_Wikipedia_hacker on meta.

BTW, do we want to build a public tasklist on meta where anyone can pick projects to work on? The SourceForge facilities are seriously underused, and it seems to me like a wiki is best for that kind of stuff.

Regards,

Erik - always pimpin' for new developers

FOKUS - Fraunhofer Insitute for Open Communication Systems Project BerliOS - http://www.berlios.de

Thanks for your reply Erik. Sounds stimuling, exciting... :-) I am going to have a look at ''How_to_become_a_Wikipedia_hacker''. As a matter of fact, I think *specifying* wiki-syntax is more important than *implementing* it. I regard myself not as a ''hacker'' any longer :-)... and I would like to rename that page to ''How_to_become_a_Wikipedia_developer''.

I also dislike the METAwiki-concept a bit. When a systems fails to describe itself in its' own terms (a recursive definition thus) it is not good: all METAwiki-people should come down to regular wiki in my opinion. I am afraid some people will soon start a metameta..... (Or am I just a bit 'paranoid'). Let's stop forking, let's unite!

I've got other weird ideas: Write a complete standalone Wiki-server in pure C. (not depending on any mySQL, PHP, Apache-webserver, etc.). Once a proper definition of Wiki-syntax and semantcs cristalizes, this could be done (I could do it). I already started to describe wikisyntax with use of (metasyntax) ISO-EBNF on one of the Dutch pages... Well, I'll keep in touch, going to read that 'hackers-page' now... Thanks, Pieter Suurmond

Pieter Suurmond

7:11 p.m.

New subject: antimeta

Wouldn't it be nice if those PHP-scripts http://www.wikipedia.org/wiki/Wikipedia:PHP_script is talking about became edittable via the normal, regular Wikipedia-interface itself!? (Writable by 'staff' only, of course, but at least readable to any visitor.)

Is this Wikipedia itself running on some kind of CVS-like system? (Then many people could use regular CVS-software to easily edit pages, and also have their copies ('backups at home') synchronised automatically.)

Can I somewhere download a (huge) weekly or monthly tarball of the complete Wilipedia-contents? How much would it actually be? 500MB or 40GB ?

Sorry for bothering, Pieter Suurmond

Andre Engels

7:25 p.m.

New subject: antimeta

On Wed, 22 Jan 2003, Pieter Suurmond wrote:

...

Wouldn't it be nice if those PHP-scripts http://www.wikipedia.org/wiki/Wikipedia:PHP_script is talking about became edittable via the normal, regular Wikipedia-interface itself!? (Writable by 'staff' only, of course, but at least readable to any visitor.)

See http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/wikipedia/phpwiki/newcodebase...

Andre Engels

Tomasz Wegrzanowski

11:56 p.m.

New subject: (wiki got me hooked now :-)

On Wed, Jan 22, 2003 at 06:48:03PM +0100, Pieter Suurmond wrote:

...

I also dislike the METAwiki-concept a bit. When a systems fails to describe itself in its' own terms (a recursive definition thus) it is not good: all METAwiki-people should come down to regular wiki in my opinion. I am afraid some people will soon start a metameta..... (Or am I just a bit 'paranoid'). Let's stop forking, let's unite!

I've got other weird ideas: Write a complete standalone Wiki-server in pure C. (not depending on any mySQL, PHP, Apache-webserver, etc.). Once a proper definition of Wiki-syntax and semantcs cristalizes, this could be done (I could do it). I already started to describe wikisyntax with use of (metasyntax) ISO-EBNF on one of the Dutch pages... Well, I'll keep in touch, going to read that 'hackers-page' now...

All in pure C without even using SQL and HTTP server ? You want to make it slow, not portable, insecure, leak memory, be hard to maintain and extend and to segfault at random or what ?

It doesn't seem like a good idea to me.

Pieter Suurmond

23 Jan 23 Jan

12:31 a.m.

New subject: Integrating WikiWare, PHP, mySQL and Apache == too weird

Tomasz Wegrzanowski wrote:

...

On Wed, Jan 22, 2003 at 06:48:03PM +0100, Pieter Suurmond wrote:

...
I also dislike the METAwiki-concept a bit. When a systems fails to describe itself in its' own terms (a recursive definition thus) it is not good: all METAwiki-people should come down to regular wiki in my opinion. I am afraid some people will soon start a metameta..... (Or am I just a bit 'paranoid'). Let's stop forking, let's unite!

I've got other weird ideas: Write a complete standalone Wiki-server in pure C. (not depending on any mySQL, PHP, Apache-webserver, etc.). Once a proper definition of Wiki-syntax and semantcs cristalizes, this could be done (I could do it). I already started to describe wikisyntax with use of (metasyntax) ISO-EBNF on one of the Dutch pages... Well, I'll keep in touch, going to read that 'hackers-page' now...

All in pure C without even using SQL and HTTP server ?

Yes, I like Wiki that much, I'm considering the effort.

...

You want to make it slow, not portable, insecure, leak memory,

No, secure, leak-free, indeed hard to maintain and extend (I dislike maintainance anyway :-) but with possibly more efficient execution. I think C and C++ are the most portable languages of all, they are standardised and documented very well, etc. . . I'm just imagining myself a mini wiki-machine with as less as possible dependancies... More the design-once/update-never-approach, which is maybe not appropriate here, all right. Thanks for your opninion, Pieter. :-)

...

be hard to maintain and extend and to segfault at random or what ?

It doesn't seem like a good idea to me. _______________________________________________ Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

Richard Grevers

22 Jan 22 Jan

9:25 p.m.

New subject: <br>

On Wed, 22 Jan 2003 17:36:07 +0100, Tomasz Wegrzanowski taw=Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org wrote:

...

On Wed, Jan 22, 2003 at 03:09:57PM +0100, Pieter Suurmond wrote:

...
Agree! :-) However... <td>s don't necessarily have to be closed, it just depends on how one defines them.

I know how to fix table markup issue and it is exactly the opposite way.

We should completely forbid </td>, </th> and </tr>. They are just noise and table is completely unambiguous without them.

Except Netscape 4 which will then refuse to display the page at all if there is any nesting of tables!

-- Richard Grevers

Pieter Suurmond

9:27 p.m.

New subject: <br>

Richard Grevers wrote:

...

On Wed, 22 Jan 2003 17:36:07 +0100, Tomasz Wegrzanowski taw=Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org wrote:

...
On Wed, Jan 22, 2003 at 03:09:57PM +0100, Pieter Suurmond wrote:

...
Agree! :-) However... <td>s don't necessarily have to be closed, it just depends on how one defines them.

I know how to fix table markup issue and it is exactly the opposite way.

We should completely forbid </td>, </th> and </tr>. They are just noise and table is completely unambiguous without them.

Except Netscape 4 which will then refuse to display the page at all if there is any nesting of tables!

And therefore, the html-output generated by the wikiserver should include terminating -td, -th, and tr-tags automatically. I think Tomasz Wegrzanowski was talking about wiki-source here and not about wiki-html-output. These are 2 different things (input and output). Kind regards, Pieter

...

-- Richard Grevers

Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

Brion Vibber

9:45 p.m.

New subject: <br>

On Wed, 22 Jan 2003, Pieter Suurmond wrote:

...

Richard Grevers wrote:

...
On Wed, 22 Jan 2003 17:36:07 +0100, Tomasz Wegrzanowski

...
We should completely forbid </td>, </th> and </tr>. They are just noise and table is completely unambiguous without them.

Except Netscape 4 which will then refuse to display the page at all if there is any nesting of tables!

And therefore, the html-output generated by the wikiserver should include terminating -td, -th, and tr-tags automatically.

Our current parser *tries* to do this (insert end tags where they're missing), but it doesn't do a very good job of it.

Anyone who'd like to rewrite it is more than welcome to do so! Please take a look at removeHTMLtags() in OutputPage.html. Once you've finished wiping the tears from your eyes, write up something better. :) If you can integrate it with the rest of the markup parsing, so much the better.

...

I think Tomasz Wegrzanowski was talking about wiki-source here and not about wiki-html-output. These are 2 different things (input and output).

As long as we allow use of the HTML tags*, we should take them both with and without end tags. We should ideally produce valid XHTML from *any* given input. If the input is broken, we may not produce what the author intended, but we *must* produce something that will pass a validating parser.

*(And if we some day decide not to allow them, we should provide equivalent functionality and, if possible, automatically convert articles to use the newer system at the time the change is made. If automatic conversion is not possible, we should make a conscious effort to fix anything that's broken.)

-- brion vibber (brion @ pobox.com)

Pieter Suurmond

10:08 p.m.

New subject: thanks

Thanks for pointing out where exactly to find it in the sources Brion (and for pointing out http://cvs.sourceforge.net, Andre). Oh, and sorry for the excessive comma's in my previous e-mail. Greetings, Pieter

Brion Vibber wrote:

...

On Wed, 22 Jan 2003, Pieter Suurmond wrote:

...
Richard Grevers wrote:

...
On Wed, 22 Jan 2003 17:36:07 +0100, Tomasz Wegrzanowski

...
We should completely forbid </td>, </th> and </tr>. They are just noise and table is completely unambiguous without them.

Except Netscape 4 which will then refuse to display the page at all if there is any nesting of tables!

And therefore, the html-output generated by the wikiserver should include terminating -td, -th, and tr-tags automatically.

Our current parser *tries* to do this (insert end tags where they're missing), but it doesn't do a very good job of it.

Anyone who'd like to rewrite it is more than welcome to do so! Please take a look at removeHTMLtags() in OutputPage.html. Once you've finished wiping the tears from your eyes, write up something better. :) If you can integrate it with the rest of the markup parsing, so much the better.

...
I think Tomasz Wegrzanowski was talking about wiki-source here and not about wiki-html-output. These are 2 different things (input and output).

As long as we allow use of the HTML tags*, we should take them both with and without end tags. We should ideally produce valid XHTML from *any* given input. If the input is broken, we may not produce what the author intended, but we *must* produce something that will pass a validating parser.

*(And if we some day decide not to allow them, we should provide equivalent functionality and, if possible, automatically convert articles to use the newer system at the time the change is made. If automatic conversion is not possible, we should make a conscious effort to fix anything that's broken.)

-- brion vibber (brion @ pobox.com)

Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

Tomasz Wegrzanowski

11:50 p.m.

New subject: <br>

On Wed, Jan 22, 2003 at 12:45:57PM -0800, Brion Vibber wrote:

...

As long as we allow use of the HTML tags*, we should take them both with and without end tags. We should ideally produce valid XHTML from *any* given input. If the input is broken, we may not produce what the author intended, but we *must* produce something that will pass a validating parser.

*(And if we some day decide not to allow them, we should provide equivalent functionality and, if possible, automatically convert articles to use the newer system at the time the change is made. If automatic conversion is not possible, we should make a conscious effort to fix anything that's broken.)

XHTML is not completely compatible with HTML, so making XHTML default may not be very friendly to some browsers.

Does anybody have experience with XHTML websites ? Are all browsers happy about that ?

Brion Vibber

23 Jan 23 Jan

2:30 a.m.

New subject: XHTML (was Re: <br>)

On mer, 2003-01-22 at 14:50, Tomasz Wegrzanowski wrote:

...

XHTML is not completely compatible with HTML, so making XHTML default may not be very friendly to some browsers.

Got any examples of trouble spots? Are they things we do?

...

Does anybody have experience with XHTML websites ? Are all browsers happy about that ?

My old website is in XHTML 1.0 transitional: http://moisty.org/ Try it in some browsers and let me know what breaks.

-- brion vibber (brion @ pobox.com)

Richard Grevers

3:43 a.m.

New subject: <br>

On Wed, 22 Jan 2003 23:50:08 +0100, Tomasz Wegrzanowski taw=Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org wrote:

...

XHTML is not completely compatible with HTML, so making XHTML default may not be very friendly to some browsers.

Does anybody have experience with XHTML websites ? Are all browsers happy about that ?

It all depends on the MIME type you serve it with. See http://www.tntluoma.com/opera/beyond30/2002/10/confessions_of_a_browser_snif... for one description of the trouble you can end up in. THis was also discussed extensively on css-d.

-- Richard Grevers

Pieter Suurmond

22 Jan 22 Jan

9:56 p.m.

New subject: table suggestion

Hi, here my table-suggestion (in ISO-EBNF): Wiki-text is compressed as follows (terminating tags are not rejected but they just dissappear from wiki-source).

At request, the server expands it again as follows:

output-rules: "<td>", blabla = "<td>", blabla, "<td>"; "<th>", blabla = "<th>", blabla, "<th>"; "<tr>", blabla = "<tr>", blabla, "<tr>";

But please let's use symbols that are different from these html-lookalikes. I'm quite new here so I don't know which symbols can be used to replace "<td>", "<th>", "<tr>", and also "<table>" (and </table> ???).

Just thinking out loud.., kind regards, Pieter Suurmond

Toby Bartels

24 Jan 24 Jan

3:52 a.m.

New subject: table suggestion

Pieter Suurmond wrote in part:

...

But please let's use symbols that are different from these html-lookalikes. I'm quite new here so I don't know which symbols can be used to replace "<td>", "<th>", "<tr>", and also "<table>" (and </table> ???).

I don't see any problems with <table>, which is self explanatory -- quite unlike <td>, <th> and <tr>! For several possibilities for new symbols (some of which make <table> unnecessary too, some of which don't), see:

http://meta.wikipedia.org/wiki/Wiki_markup_tables

(Not much happening there lately, but you could start it up again. I for one have it on my watchlist.)

-- Toby

Erik Moeller

22 Jan 22 Jan

3:29 p.m.

New subject: Goodbye to HTML

On Die, 2003-01-21 at 18:04, tarquin wrote:

...

erroneous use of <BR> -- if you need a new paragraph, make one. If you

feel you need a "half paragraph-break", then rethink your writing style.

That's incorrect - there are valid uses for <BR>: text in tables, image captions, certain link layouts, lists where the <ul> formatting would be distracting .. even the Main Page uses <BR> repeatedly. I agree that it is often misunderstood, but there are plenty of legitimate uses.

Regards,

Erik

-- FOKUS - Fraunhofer Insitute for Open Communication Systems Project BerliOS - http://www.berlios.de

Pieter Suurmond

3:32 p.m.

New subject: <brrrrrr>

Let's not call them "<BR>" any longer, it reminds too much to html. But very sure we will still need single line-breaks here and there. Kind regards, Pieter

Erik Moeller wrote:

...

On Die, 2003-01-21 at 18:04, tarquin wrote:

...

erroneous use of <BR> -- if you need a new paragraph, make one. If you

feel you need a "half paragraph-break", then rethink your writing style.

That's incorrect - there are valid uses for <BR>: text in tables, image captions, certain link layouts, lists where the <ul> formatting would be distracting .. even the Main Page uses <BR> repeatedly. I agree that it is often misunderstood, but there are plenty of legitimate uses.

Regards,

Erik

FOKUS - Fraunhofer Insitute for Open Communication Systems Project BerliOS - http://www.berlios.de

Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

Magnus Manske

3:44 p.m.

New subject: Goodbye to HTML

Just thinking:

Sometimes, there's an image which is not very wide. You don't want it to occupy a whole line, because that would create a huge white space. So, you'll align the image left or right from the article. (I prefer right, so the text of the article will align left).

Now that image has a caption. The caption would be wider than the image, which looks ugly, so you want to wrap it. Also, the caption should be centered with the image. What I do is

<table align=right> <tr> <td align=center width=1> [[image:xyz.jpg]]<br> Some longer caption, which is wider than the image itself </td> </tr> </table>

Someone explain to me how to do that in wiki syntax, with the *same* result, without the code looking more ugly? (you are free to invent a wiki table syntax for that puropse!)

Magnus

Andre Engels

3:47 p.m.

New subject: Goodbye to HTML

On Wed, 22 Jan 2003, Magnus Manske wrote:

...

Just thinking:

Sometimes, there's an image which is not very wide. You don't want it to occupy a whole line, because that would create a huge white space. So, you'll align the image left or right from the article. (I prefer right, so the text of the article will align left).

Now that image has a caption. The caption would be wider than the image, which looks ugly, so you want to wrap it. Also, the caption should be centered with the image. What I do is

<table align=right> <tr> <td align=center width=1> [[image:xyz.jpg]]<br> Some longer caption, which is wider than the image itself </td> </tr> </table>

Someone explain to me how to do that in wiki syntax, with the *same* result, without the code looking more ugly? (you are free to invent a wiki table syntax for that puropse!)

Although also using HTML in this case, I still use a slightly different syntax. What I do is:

<table align=right> <tr><td align="center">[[image:xyz.jpg]] <tr><td align="center"> ''Some longer caption, which is wider<br>than the image itself'' </table>

And yes, I know it is bad HTML to not close the tr's and td's, but I guess I'm lazy...

Andre Engels

Pieter Suurmond

3:52 p.m.

New subject: Goodbye to HTML (not closing == correct HTML !)

No, you are not lazy at all Andre (as I see how much work you spend daily on maintaining the Dutch wikipedia :-). And furthermore, not closing tr-, th- and td-tags is VALID HTML 4 ! Indeed, in older HTML-versions this was not correct, but in HTML 4.01 you do NOT need to close these tags any longer. Read: http://www.w3.org/TR/html401/struct/tables.html#h-11.2.5

Kind regards, Pieter

Andre Engels wrote:

...

On Wed, 22 Jan 2003, Magnus Manske wrote:

...
Just thinking:

Sometimes, there's an image which is not very wide. You don't want it to occupy a whole line, because that would create a huge white space. So, you'll align the image left or right from the article. (I prefer right, so the text of the article will align left).

Now that image has a caption. The caption would be wider than the image, which looks ugly, so you want to wrap it. Also, the caption should be centered with the image. What I do is

<table align=right> <tr> <td align=center width=1> [[image:xyz.jpg]]<br> Some longer caption, which is wider than the image itself </td> </tr> </table>

Someone explain to me how to do that in wiki syntax, with the *same* result, without the code looking more ugly? (you are free to invent a wiki table syntax for that puropse!)

Although also using HTML in this case, I still use a slightly different syntax. What I do is:

<table align=right> <tr><td align="center">[[image:xyz.jpg]] <tr><td align="center"> ''Some longer caption, which is wider<br>than the image itself'' </table>

And yes, I know it is bad HTML to not close the tr's and td's, but I guess I'm lazy...

Andre Engels

Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

Magnus Manske

4:03 p.m.

New subject: Goodbye to HTML

Andre Engels wrote:

...

Although also using HTML in this case, I still use a slightly different syntax. What I do is:

<table align=right> <tr><td align="center">[[image:xyz.jpg]] <tr><td align="center"> ''Some longer caption, which is wider<br>than the image itself'' </table>

OK, here's an idea: Let's keep the <table>s valid, but expand the wiki syntax for some often reoccurring cases!

I'm going to use HTML-style here, but we can do | or \ or {} or whatever:

<layout image right> [[image:xyz.jpg]]

''Some longer caption, which is wider<br>than the image itself'' </layout>

Which would translate <layout image right> => <table align=right width=1><tr><td align=center> </layout> => </table> (or </td></tr></table>)

We could also define that within <layout>, all blank lines are converted into <br>s.

Similar for the "current events" on the Main Page: <layout box center> ...stuff... </layout>

All this would _not_ eliminate <table> tags from wikipedia, but for every-day stuff, it would do. Things like the PSE or the nucleotide decay table would stay HTML.

Comments?

Magnus

Erik Moeller

4:10 p.m.

New subject: Goodbye to HTML

On Mit, 2003-01-22 at 16:03, Magnus Manske wrote:

...

<layout image right> => <table align=right width=1><tr><td align=center> </layout> => </table> (or </td></tr></table>)

A table containing an image should be as wide as the image itself. That way, you do not need any arbitrary breaks in the text. That's one reason I think this kind of layout should be supported by our [[Image]] syntax.

[[Image:Foo.jpg width=400]] -> generate 400 pixel wide version and show that one, image links to original size version

[[Image:Foo.jpg showtext]] -> show the text of the image as caption

[[Image:Foo.jpg right]] -> embed in right-aligned table which uses the image width as width

For other tables, we should develop a table syntax, as is already being discussed on Meta. I do support the eventual goal of getting away from HTML, but only if we can do everything we're doing now. Note that we'll have to do *lots* of replacing and rewriting if we ever disable HTML.

Regards,

Erik

-- FOKUS - Fraunhofer Insitute for Open Communication Systems Project BerliOS - http://www.berlios.de

Pieter Suurmond

4:24 p.m.

New subject: Byebye HTML

Erik Moeller wrote:

...

For other tables, we should develop a table syntax, as is already being discussed on Meta. I do support the eventual goal of getting away from HTML, but only if we can do everything we're doing now. Note that we'll have to do *lots* of replacing and rewriting if we ever disable HTML.

Regards, Erik

So the sooner the better (ban HTML). Pieter

Erik Moeller

4:39 p.m.

New subject: Byebye HTML

On Mit, 2003-01-22 at 16:24, Pieter Suurmond wrote:

...

So the sooner the better (ban HTML).

First we have to replace it. Care to contribute some code? The image page handling desperately needs rewriting.

Regards,

Erik

-- FOKUS - Fraunhofer Insitute for Open Communication Systems Project BerliOS - http://www.berlios.de

Tomasz Wegrzanowski

5:42 p.m.

New subject: Markup must stay compatible [was: Byebye HTML]

On Wed, Jan 22, 2003 at 04:39:01PM +0100, Erik Moeller wrote:

...

On Mit, 2003-01-22 at 16:24, Pieter Suurmond wrote:

...
So the sooner the better (ban HTML).

First we have to replace it. Care to contribute some code? The image page handling desperately needs rewriting.

Breakng existing markup is extremely counter-productive. Please don't do that. It would be very wrong if articles broke just because someone doesn't like HTML.

tarquin

4:42 p.m.

New subject: Goodbye to HTML

Erik Moeller wrote:

...

[[Image:Foo.jpg showtext]] -> show the text of the image as caption

It would be a good idea to store the caption of the image on the image page. (or maybe say the first line of the image page is magically the caption)

A system like that would also allow us to apply a CSS class to the caption text automatically, so all image captions would have the same font & style

...

Erik Moeller

4:48 p.m.

New subject: Goodbye to HTML

On Mit, 2003-01-22 at 16:42, tarquin wrote:

...

Erik Moeller wrote:

...
[[Image:Foo.jpg showtext]] -> show the text of the image as caption

It would be a good idea to store the caption of the image on the image page. (or maybe say the first line of the image page is magically the caption)

Yes, that was my original idea, but you have to make this optional, otherwise many layouts would be broken in the transition. Note that many of our image pages contain comments like "found this somewhere", because people treat the image description line on the upload page like the summary line on the edit page.

Regards,

Erik

-- FOKUS - Fraunhofer Insitute for Open Communication Systems Project BerliOS - http://www.berlios.de

tarquin

5:01 p.m.

New subject: Goodbye to HTML

Erik Moeller wrote:

...

Yes, that was my original idea, but you have to make this optional, otherwise many layouts would be broken in the transition. Note that many of our image pages contain comments like "found this somewhere", because people treat the image description line on the upload page like the summary line on the edit page.

We can fix those :-) We could also say that if the first line of the image page is "No caption", then no caption is to be displayed, even if a [[Image: ... showtext]] requests it

...

Magnus Manske

5:10 p.m.

New subject: Goodbye to HTML

tarquin wrote:

...

We could also say that if the first line of the image page is "No caption", then no caption is to be displayed, even if a [[Image: ... showtext]] requests it

I'd like to have a longer description on some figures, especially some of my self-drawn diagrams, which are far from self-explaining... :-)

So, how about defining the caption as "everything until the first horizontal line (----)"? We could have multiple lines on the description page, horizontal line, then the source and other information that doesn't really belong into the caption. "----" as the first line of the description would inhibit a caption automatically.

Magnus

Erik Moeller

5:20 p.m.

New subject: Goodbye to HTML

On Mit, 2003-01-22 at 17:10, Magnus Manske wrote:

...

tarquin wrote:

...
We could also say that if the first line of the image page is "No caption", then no caption is to be displayed, even if a [[Image: ... showtext]] requests it

I'd like to have a longer description on some figures, especially some of my self-drawn diagrams, which are far from self-explaining... :-)

So, how about defining the caption as "everything until the first horizontal line (----)"? We could have multiple lines on the description page, horizontal line, then the source and other information that doesn't really belong into the caption. "----" as the first line of the description would inhibit a caption automatically.

The "----" separation makes sense (a bit hackish), but I don't see the need for any forced inhibition if we make the captions optional (showtext) anyway.

I'd also like to repeat that an image page should ''always'' show the image it refers to, in its 1:1 dimensions.

Regards,

Erik

-- FOKUS - Fraunhofer Insitute for Open Communication Systems Project BerliOS - http://www.berlios.de

Brion Vibber

23 Jan 23 Jan

1:26 a.m.

New subject: Goodbye to HTML

On mer, 2003-01-22 at 07:48, Erik Moeller wrote:

...

Note that many of our image pages contain comments like "found this somewhere", because people treat the image description line on the upload page like the summary line on the edit page.

As indeed they should, that being the intent of its creation. The copying of the upload note into the image description page on creation was done just to get _something_ in them.

A single-line input is unfriendly for long descriptions, and captions are a whole nother matter. Perhaps the upload form needs to be made a little more flexible?

-- brion vibber (brion @ pobox.com)

Andre Engels

22 Jan 22 Jan

6:42 p.m.

New subject: Goodbye to HTML

...

It would be a good idea to store the caption of the image on the image page. (or maybe say the first line of the image page is magically the caption)

But what if the same image is used on several pages? It is rare, but it does happen, and one might want different captions in those cases.

Andre Engels

Jocelyn Giraud

23 Jan 23 Jan

1:32 p.m.

New subject: Goodbye to HTML

Andre Engels a écrit :

...

Although also using HTML in this case, I still use a slightly different syntax. What I do is:

<table align=right> <tr><td align="center">[[image:xyz.jpg]] <tr><td align="center"> ''Some longer caption, which is wider<br>than the image itself'' </table>

About banning HTML and using wiki-like syntax, there is an interesting feature in [ http://phpwiki.sf.net/ phpwiki ] that allow wiki-like tables.

the feature look like :

every line starting with a | is part of a table (I don't thing starting a line with a pipe is something very used for anything that is not that kind of stuff...)

if you write :

You'll have "big value" in a <td> with a colspan=2 ( v mean that you want to extend the cell to the bottom)

|vv mean <td colspan='3'> etc.

|> mean rowspan=2 |>> mean rowspan=3 etc.

I do really like this idea (I use phpwiki at home and at work, and that of tables is very usefull...)

The only thing that phpwiki's table don't have is "align='right'" on the whole table (<table align='right'>), but I'm sure it can be done with a good idea (like "|[table: align=right]|" before the first line, or any other idea...

The probleme is that it can be slow to render and need a lot of RAM (may be). _________________________________________________________________ "This e-mail is intended for the exclusive use of the individual or entity named above and may constitute information that is priviledged or confidential or otherwise protected from disclosure . Dissemination , distribution , forwarding or copying of this e-mail by anyone other than the intended recipient is prohibited . If you have received this e-mail in error , please notify us and completly delete or destroy any and all electronic or other copies of the original message" _________________________________________________________________

Pieter Suurmond

8:29 p.m.

New subject: | | My column | Other column

Jocelyn Giraud wrote:

...

Andre Engels a écrit :

...
Although also using HTML in this case, I still use a slightly different syntax. What I do is:

<table align=right> <tr><td align="center">[[image:xyz.jpg]] <tr><td align="center"> ''Some longer caption, which is wider<br>than the image itself'' </table>

About banning HTML and using wiki-like syntax, there is an interesting feature in [ http://phpwiki.sf.net/ phpwiki ] that allow wiki-like tables.

the feature look like :

| | My column | Other column | first line | value | value | other line | value | ''value''

YES I LOVE THIS !!! Elegant, and much more readable than that dirty html. I have no objections. Greetings, Pieter Suurmond.

...

every line starting with a | is part of a table (I don't thing starting a line with a pipe is something very used for anything that is not that kind of stuff...)

if you write :

| | My column | Other column | first line |v big value | value | other line | ''value''

You'll have "big value" in a <td> with a colspan=2 ( v mean that you want to extend the cell to the bottom)

|vv mean <td colspan='3'> etc.

|> mean rowspan=2 |>> mean rowspan=3 etc.

I do really like this idea (I use phpwiki at home and at work, and that of tables is very usefull...)

The only thing that phpwiki's table don't have is "align='right'" on the whole table (<table align='right'>), but I'm sure it can be done with a good idea (like "|[table: align=right]|" before the first line, or any other idea...

The probleme is that it can be slow to render and need a lot of RAM (may be). _________________________________________________________________ "This e-mail is intended for the exclusive use of the individual or entity named above and may constitute information that is priviledged or confidential or otherwise protected from disclosure . Dissemination , distribution , forwarding or copying of this e-mail by anyone other than the intended recipient is prohibited . If you have received this e-mail in error , please notify us and completly delete or destroy any and all electronic or other copies of the original message" _________________________________________________________________

Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

Pieter Suurmond

24 Jan 24 Jan

8:28 p.m.

New subject: Wookee syntax (Re Jocelyn Giraud)

About wiki-tables,

I was reading http://meta.wikipedia.org/wiki/Wiki_markup_tables and it seems that what Jocelyn Giraud was mentioning is called "Wookee"-syntax.

I mentiod your name Jocelyn, on that meta.wikipedia-page, please remove it (or let me know) if you don't like that.

1: List-style syntax 2: Extended PikiePikie syntax 3: MoinMoin syntax 4: Wookee syntax <--- 5: PikiePikie syntax

Seems like Tarquin and Jan Hidders already did a lot of work on this. I personally feel attracted to Wookee (just loved it at first sight, not because of any reasoning or real thinking). Any other opninions?

...

Jocelyn Giraud wrote:

...
The probleme is that it can be slow to render and need a lot of RAM (may be).

Can someone explain this last remark? I don't see why it (what) needs a lot of CPU and/or memory.

Thanks / Kind regards, Pieter Suurmond

tarquin

9:39 p.m.

New subject: Wookee syntax (Re Jocelyn Giraud)

Pieter Suurmond wrote:

...

About wiki-tables,

I was reading http://meta.wikipedia.org/wiki/Wiki_markup_tables and it seems that what Jocelyn Giraud was mentioning is called "Wookee"-syntax.

The "wookee syntax" is part of Wookee, a Wiki markup parser module written in Perl by Mych, over at Unreal Wiki. It's uses OOP, and it's very easily extendable.

someone mentioned recently that our markup parser needs an overhaul -- would our developers be interested in taking a look at this?

...

Magnus Manske

9:49 p.m.

New subject: Wookee syntax (Re Jocelyn Giraud)

I couldn't resist and added a new proposal to http://meta.wikipedia.org/wiki/Wiki_markup_tables

Called it "Half-HTML proposal" (working title;-)

Syntax is quite simple for simple tables, but allows full HTML attributes if needed, as well as nested tables.

Have a look!

Magnus

Toby Bartels

23 Jan 23 Jan

7:04 a.m.

New subject: Goodbye to HTML

tarquin wrote:

...

What do we use HTML for?

I doubt that anybody can come up with a complete list from memory; I'm often suprised to see a new use of HTML when I come across it.

We should go through each HTML tag that we currently pass through, one by one, and decide how to make it a wiki markup. (Possibly wiki markup that looks just like an HTML tag -- I still think that this is perfectly reasonable for rarely used markup like <br> and <i> -- but it still needs to be parsed as wiki markup, not just passed through unparsed.) Then we need to search for all appearances in the text (a case insensitive search for "<%s>", "<%s ", and "<%s\n" in C notation) and translate every instance into the new syntax (if necessary) -- or agree that we will not support the effect being done (quite possible for some uses of <font>, for example).

Actually, we should probably search for uses in the first step, so that we can make all necessary decisions right up front; but we'll still need to search again just before changing the code. To avoid unnecessarily extended discussions over and over again, we make all of the decisions on, say, [[meta:HTML_to_wiki]] (there may well already exist a meta page along these lines), and direct people there if they ask about it on <wikitech-l>.

-- Toby

Pieter Suurmond

21 Jan 21 Jan

3:53 a.m.

New subject: Thanks for W3C-valid HTML 4.01 so quickly!

That's great Brion Vibber! Now http://www.wikipedia.org/ is indeed 100% valid. The Dutch site still contains some errors but that's mainly due to things like "Ð¾Ñ?Ñ?Ð¸Ñ?">Russkiy</a>" on http://nl.wikipedia.org. I'll try to repair those manually... I've another whish: could a script on the wikiserver do automatic character-to-entity-conversion like?: à --> à ë --> ë Well, I'm only suggesting... Anyhow, thanks for fixing the English front page so very quickly!

Pieter Suurmond [Let's promote [the use of] Open Standards like W3C]

Brion Vibber wrote:

...

On lun, 2003-01-20 at 16:22, Pieter Suurmond wrote:

...
Wikipages generated by the server do not follow W3C recommmendations. Now that we eventually have an open standard for HTML, why not use it?

Forgetfulness?

...
One of the reasons why http://www.wikipedia.org does not validate is that the character-set is not specified, you should include a line like this:

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">

This is set in the http headers. I assume you were validating a copy that lacks the header?

...
You can easily see what other errors are made, when you type in an URI at http://validator.w3.org/. Page "http://www.wikipedia.org/" for example is not Valid HTML 4.01 Transitional! Below are the results of attempting to parse this document with an SGML parser.

1.Line 7, column 7: required attribute "TYPE" not specified.
    <SCRIPT>
           ^
Grr... I blame Magnus. Fixed.

...
2.Line 87, column 28: start tag for "TR" omitted, but its declaration does not permit this.
    <th colspan="2" align=center><big> Selected Articles </big></th></tr>
                              ^
That's an error in the wiki page; our parser can correct some errors, but isn't smart enough to fix all of them. Fixed.

The front page now validates.

-- brion vibber (brion @ pobox.com)
                   Name: signature.asc
signature.asc Type: application/pgp-signature Description: This is a digitally signed message part

Brion Vibber

4:37 a.m.

New subject: Thanks for W3C-valid HTML 4.01 so quickly!

On lun, 2003-01-20 at 18:53, Pieter Suurmond wrote:

...

That's great Brion Vibber! Now http://www.wikipedia.org/ is indeed 100% valid. The Dutch site still contains some errors but that's mainly due to things like "Ð¾Ñ?Ñ?Ð¸Ñ?">Russkiy</a>" on http://nl.wikipedia.org. I'll try to repair those manually...

Another problem is the inclusion of external links with ampersands in them. Strictly speaking, ampersands in links need to be '&' instead of '&' because entity interpretation is done before the tag attributes are; all our generated links take this into account (I think!), but we don't presently provide such conversion for inline external links in a wiki page (in part because we don't know what the author intended...)

It would I think not be unreasonable for the wiki parser to automatically convert &s not followed by an entity body to & ... should we things that do look like entities to go through intact, though? Or just escape all &s? (Upside: simple, consistent behavior. Downside: inconsistent with wikilinks, where entities are allowed.)

...

I've another whish: could a script on the wikiserver do automatic character-to-entity-conversion like?: à --> à ë --> ë

I'd rather do the other way around, convert input entities to real characters and keep them that way; entities are bandwidth hogs and not really particularly helpful. Text on the Chinese and Japanese wikis for instance would take about 3 times the bandwidth they presently do using numeric entities instead of UTF-8.

If you want to see the names of the entities in the textarea (so as to avoid the editors that damage non-ASCII text), they have to be escaped again with an &, and any text that uses non-trivial amounts of non-ASCII characters becomes illegible. As an option and for known unfriendly user agents it may be helpful, but I'd avoid it if I could.

...

Well, I'm only suggesting... Anyhow, thanks for fixing the English front page so very quickly!

You're welcome!

-- brion vibber (brion @ pobox.com)

7995

Age (days ago)

7998

Last active (days ago)

wikitech-l@lists.wikimedia.org

54 comments

11 participants

tags (0)

participants (11)

Andre Engels
Brion Vibber
Brion Vibber
Erik Moeller
Jocelyn Giraud
Magnus Manske
Pieter Suurmond
Richard Grevers
tarquin
Toby Bartels
Tomasz Wegrzanowski