------------ Původní zpráva ------------ Od: Brion Vibber brion@wikimedia.org Předmět: Re: [Wikitech-l] Anchors haven't id attribute Datum: 26.12.2008 06:30:00
On 12/25/08 4:32 AM, Danny B. wrote:
I have reverted both revisions in r45021 and r45022 because it caused massive
invalidity of pages.
Given that we've been outputting these as "id" attributes for the last few years already (as output by Tidy), I have reverted your revert in r45044 pending further discussion.
-- brion
Well, the id was added _only_ to those tags, where name was transferable to id - thus had to start with ASCII letter. _Never_ to those, which did not conform this rule (the regexp mentioned in my previous post). Easily provable by either running older revision of MediaWiki or testing in Tidy directly:
Take this code excerpt (and wrap it with minimal XHTML document stuff) and run it through Tidy:
<a name="X"></a><h2> <span class="mw-headline"> X </span></h2> <a name="1X"></a><h2> <span class="mw-headline"> 1X </span></h2> <a name=".C3.81X"></a><h2> <span class="mw-headline"> ÁX </span></h2> <a name="-X"></a><h2> <span class="mw-headline"> -X </span></h2>
The result will be:
<a name="X" id="X"></a><h2><span class="mw-headline">X</span></h2> <a name="1X"></a><h2><span class="mw-headline">1X</span></h2> <a name=".C3.81X"></a><h2><span class="mw-headline">ÁX</span></h2> <a name="-X"></a><h2><span class="mw-headline">-X</span></h2>
Now, let me repeat, how the "id" is defined:
1: XHTML is reformulation of HTML 4 as an XML 1.0 application. 2: That means it takes every single definition from HTML 4 and keeps it unless it is overriden in XHTML. 3: The id and name has been defined in HTML 4 as /[A-Za-z][A-Za-z0-9:_.-]*/ [1] [2] 4: The name has been redefined to NMTOKEN [2] [3] 5: The id has never been redefined thus stays on definition mentioned in point 3 above.
This is how the id in XHTML was always handled since the XHTML is out. I also think that such important thing like handling of id is, was fixed in validator during so many years if it wasn't correct.
So currently, all non-latin-chars wikis are now totally invalid according to W3C validator. Major parts of non-ASCII-chars wikis are invalid as well. Therefore is very hard to find other invalid mistakes in code when having worthless positives on every other page. :-(
Also one thing at the end: I think that the current rendering with controversial ids brought more negatives (such as much lowering down the ability to find the real invalid parts of the code) than positives - well, it was working correctly before, so what benefit it actually brought? On the other hand it brought this controversy.
I take the point that I (and majority of people over the world, the validator, Tidy and so many other tools etc.) _may_ be wrong with the interpretation of definition of id. But I guess unless the authority tools, as validator or Tidy are, are fixed in this issue - thus can be proved we render the page correctly - we should not render that way. As I mentioned above - it was working correctly before so there is no urge to force the new rendering since it is not correcting any mistake or misfunctionality.
[1] http://www.w3.org/TR/html401/types.html#type-name [2] http://www.w3.org/TR/xhtml1/#C_8 [3] http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Nmtoken
Kind regards
Danny B.