Isn't it recommanded to use id rather than name to create anchors? If so, it could be a good idea to fix this. We could add the id attribute with the same value as the name attribute.
In Linker.php, line 1521 ( http://svn.wikimedia.org/doc/Linker_8php-source.html#l01521) :
public function makeHeadline( $level, $attribs, $anchor, $text, $link ) { return "<a name="$anchor"></a><h$level$attribs$link <span class="mw-headline">$text</span></h$level>"; }
(Tell me if I'm wrong.)
— Sylvain Brunerie [[w:fr:User:Delhovlyn]]
But then the anchors would change if a section is placed above it, breaking links.
Soxred93
On Dec 19, 2008, at 10:03 AM [Dec 19, 2008 ], Sylvain Brunerie wrote:
Isn't it recommanded to use id rather than name to create anchors? If so, it could be a good idea to fix this. We could add the id attribute with the same value as the name attribute.
In Linker.php, line 1521 ( http://svn.wikimedia.org/doc/Linker_8php-source.html#l01521) :
public function makeHeadline( $level, $attribs, $anchor, $text, $link ) { return "<a name="$anchor"></a><h$level$attribs$link <span class="mw-headline">$text</span></h$level>"; }
(Tell me if I'm wrong.)
— Sylvain Brunerie [[w:fr:User:Delhovlyn]] _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
"Platonides" Platonides@gmail.com wrote in message news:gigju7$oii$1@ger.gmane.org...
Soxred93 wrote:
But then the anchors would change if a section is placed above it, breaking links.
Soxred93
ids would depend on the section name, just as the current name. It's not using a numeric counter for the id.
It is possible to have several identical headings on a page, but you are not allowed to have duplicate id attributes within a page, so it's not as simple as just using the section name as is currently the case. Note that duplicate values for the 'name' attribute is perfectly valid HTML, though a little useless as a link destination (the first occurence on the page will always be used). Actually, come to think of it, how does the TOC handle this?
- Mark Clements (HappyDog)
Mark Clements (HappyDog) wrote:
It is possible to have several identical headings on a page, but you are not allowed to have duplicate id attributes within a page, so it's not as simple as just using the section name as is currently the case. Note that duplicate values for the 'name' attribute is perfectly valid HTML, though a little useless as a link destination (the first occurence on the page will always be used). Actually, come to think of it, how does the TOC handle this?
Headings that would have a name attribute identical to an earlier heading will have a number appended to the name to make it unique. So no problem there.
"Ilmari Karonen" nospam@vyznev.net wrote in message news:494F7CDE.3060008@vyznev.net...
Mark Clements (HappyDog) wrote:
It is possible to have several identical headings on a page, but you are not allowed to have duplicate id attributes within a page, so it's not as simple as just using the section name as is currently the case. Note that duplicate values for the 'name' attribute is perfectly valid HTML, though a little useless as a link destination (the first occurence on the page will always be used). Actually, come to think of it, how does the TOC handle this?
Headings that would have a name attribute identical to an earlier heading will have a number appended to the name to make it unique. So no problem there.
But then section links will be incorrect if the order of the duplicate headings changes. Mind you, I'm not sure if there's a way to avoid that, short of giving each section a permanent ID at the point it is first saved.
- Mark Clements (HappyDog)
On Mon, Dec 22, 2008 at 5:34 AM, Mark Clements (HappyDog) gmane@kennel17.co.uk wrote:
It is possible to have several identical headings on a page, but you are not allowed to have duplicate id attributes within a page, so it's not as simple as just using the section name as is currently the case. Note that duplicate values for the 'name' attribute is perfectly valid HTML
No, it's not.
"This attribute names the current anchor so that it may be the destination of another link. The value of this attribute must be a unique anchor name. The scope of this name is the current document. Note that this attribute shares the same name space as the id attribute." http://www.w3.org/TR/html4/struct/links.html#h-12.2
The name attribute not only must be unique, but shares its namespace with the id attribute, so any uniqueness problems with id's would have already existed with names.
On Mon, Dec 22, 2008 at 9:36 AM, Mark Clements (HappyDog) gmane@kennel17.co.uk wrote:
But then section links will be incorrect if the order of the duplicate headings changes.
This is and always has been true. Of course, the same problem occurs in the (far more common) case that an existing section is renamed or deleted outright. Sections in MediaWiki are not versioned and are accorded no permanence. This is reasonable enough, since their content is routinely moved around the page. A permanent link to a section isn't worth much if the page is refactored and half the content is in some other section.
"Aryeh Gregor" Simetrical+wikilist@gmail.com wrote in message news:7c2a12e20812220646m65db7035icda30fd0e104b7e1@mail.gmail.com...
On Mon, Dec 22, 2008 at 5:34 AM, Mark Clements (HappyDog) gmane@kennel17.co.uk wrote:
It is possible to have several identical headings on a page, but you are not allowed to have duplicate id attributes within a page, so it's not as simple as just using the section name as is currently the case. Note that duplicate values for the 'name' attribute is perfectly valid HTML
No, it's not.
"This attribute names the current anchor so that it may be the destination of another link. The value of this attribute must be a unique anchor name. The scope of this name is the current document. Note that this attribute shares the same name space as the id attribute." http://www.w3.org/TR/html4/struct/links.html#h-12.2
The name attribute not only must be unique, but shares its namespace with the id attribute, so any uniqueness problems with id's would have already existed with names.
I wasn't aware of that. On form elements (input/select/etc.) duplicates for the name attribute are allowed (and are, in fact, required in some situations, such as radio-button arrays). It's somewhat confusing that the same attribute has different meanings in different contexts!
On Mon, Dec 22, 2008 at 9:36 AM, Mark Clements (HappyDog) gmane@kennel17.co.uk wrote:
But then section links will be incorrect if the order of the duplicate headings changes.
This is and always has been true. Of course, the same problem occurs in the (far more common) case that an existing section is renamed or deleted outright. Sections in MediaWiki are not versioned and are accorded no permanence. This is reasonable enough, since their content is routinely moved around the page. A permanent link to a section isn't worth much if the page is refactored and half the content is in some other section.
Agreed, and the proposed (now implemented?) fix handles this better than we currently do.
On a side note, I would be interested to know how the purple numbers extension (which was recently announced on this list) handles this. In as much as I understand purple numbers, they give a permanent way of referencing an article, down to the paragraph, or maybe even sentence level. Are these numbers retained across edits? If so, perhaps this can be leveraged to solve some of the various 'sections referenced by heading text' issues that exist.
- Mark Clements (HappyDog)
2008/12/19 Soxred93 soxred93@gmail.com
But then the anchors would change if a section is placed above it, breaking links.
Soxred93
As Aryeh Gregor said, the id attribute would be exactly the same as the actual name attribute. I don't see how this could be a problem.
If I'm not mistaken, the new code would be as follows :
public function makeHeadline( $level, $attribs, $anchor, $text, $link ) { return "<a name="$anchor" id="$anchor"></a><h$level$attribs$link <span class="mw-headline">$text</span></h$level>";}
— Sylvain Brunerie [[w:fr:User:Delhovlyn]]
On Fri, Dec 19, 2008 at 10:03 AM, Sylvain Brunerie sylvain.brunerie@gmail.com wrote:
Isn't it recommanded to use id rather than name to create anchors? If so, it could be a good idea to fix this. We could add the id attribute with the same value as the name attribute.
Yes, that's recommended behavior. I don't see any reason not to do it. We might even consider scrapping the <a> altogether at this point -- which browsers don't support jumping to id anymore?
On Fri, Dec 19, 2008 at 11:25 AM, Soxred93 soxred93@gmail.com wrote:
But then the anchors would change if a section is placed above it, breaking links.
What do you mean by this? The id would be the same as the existing name, and (therefore necessarily) on the same element.
On 12/19/08 7:03 AM, Sylvain Brunerie wrote:
Isn't it recommanded to use id rather than name to create anchors? If so, it could be a good idea to fix this. We could add the id attribute with the same value as the name attribute.
Looks like Tidy is already adding these for us:
<p><a name="Source_tag_syntax_highlighting" id="Source_tag_syntax_highlighting"></a></p>
So we might as well do them ourselves. :)
-- brion
(Strange. I'm pretty sure I saw at a moment anchors without id attribute in Firebug.) But you're right, that's not a reason to not fix this. :)
— Sylvain Brunerie [[w:fr:User:Delhovlyn]]
2008/12/19 Brion Vibber brion@wikimedia.org
On 12/19/08 7:03 AM, Sylvain Brunerie wrote:
Isn't it recommanded to use id rather than name to create anchors? If so,
it
could be a good idea to fix this. We could add the id attribute with the same value as the name attribute.
Looks like Tidy is already adding these for us:
<p><a name="Source_tag_syntax_highlighting" id="Source_tag_syntax_highlighting"></a></p>
So we might as well do them ourselves. :)
-- brion
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Brion Vibber wrote:
On 12/19/08 7:03 AM, Sylvain Brunerie wrote:
Isn't it recommanded to use id rather than name to create anchors? If so, it could be a good idea to fix this. We could add the id attribute with the same value as the name attribute.
Looks like Tidy is already adding these for us:
<p><a name="Source_tag_syntax_highlighting" id="Source_tag_syntax_highlighting"></a></p>
So we might as well do them ourselves. :)
Done in r44896.
On 12/22/08 3:47 AM, Ilmari Karonen wrote:
Brion Vibber wrote:
Looks like Tidy is already adding these for us:
<p><a name="Source_tag_syntax_highlighting" id="Source_tag_syntax_highlighting"></a></p>
So we might as well do them ourselves. :)
Done in r44896.
Yay!
*cough* don't forget to update the parser test cases *cough* :)
21 previously passing test(s) now FAILING! :( * Bug 6563: Edit link generation for section shown by <includeonly> [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Bug 6563: Edit link generation for section suppressed by <includeonly> [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Basic section headings [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Section headings with TOC [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Handling of sections up to level 6 and beyond [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * TOC regression (bug 9764) [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * TOC with wgMaxTocLevel=3 (bug 6204) [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Resolving duplicate section names [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Resolving duplicate section names with differing case (bug 10721) [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Template with sections, __NOTOC__ [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * __NOEDITSECTION__ keyword [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Link inside a section heading [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * TOC regression (bug 12077) [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Fuzz testing: Parser14 [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Fuzz testing: Parser14-table [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Inclusion of !userCanEdit() content [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Out-of-order TOC heading levels [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * -{}- tags within headlines (within html for parserConvert()) [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * Morwen/13: Unclosed link followed by heading [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * HHP2.1: Heuristics for headings in preprocessor parenthetical structures [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)] * HHP2.2: Heuristics for headings in preprocessor parenthetical structures [Introduced between 19-Dec-2008 01:44:11, 1.14alpha (r44790) and 22-Dec-2008 17:38:33, 1.14alpha (r44901)]
-- brion
On 12/22/08 9:41 AM, Brion Vibber wrote:
On 12/22/08 3:47 AM, Ilmari Karonen wrote:
Done in r44896.
Yay!
*cough* don't forget to update the parser test cases *cough* :)
21 previously passing test(s) now FAILING! :(
21 previously failing test(s) now PASSING! :)
whee :D
-- brion
------------ Původní zpráva ------------ Od: Brion Vibber brion@wikimedia.org Předmět: Re: [Wikitech-l] Anchors haven't id attribute Datum: 23.12.2008 18:46:40
On 12/22/08 9:41 AM, Brion Vibber wrote:
On 12/22/08 3:47 AM, Ilmari Karonen wrote:
Done in r44896.
Yay!
*cough* don't forget to update the parser test cases *cough* :)
21 previously passing test(s) now FAILING! :(
21 previously failing test(s) now PASSING! :)
whee :D
-- brion
I have reverted both revisions in r45021 and r45022 because it caused massive invalidity of pages.
The id attribute in XHTML is defined as /[A-Za-z][A-Za-z0-9:_.-]*/ [1][2]- and that's exactly why we use <a name="..."> in cases where the following <h#> tag doesn't have id because of these restrictions. The name attribute is of NMTOKEN [3] thus allows to begin with non-alpha character.
[1] http://www.w3.org/TR/html401/types.html#type-name [2] http://www.w3.org/TR/xhtml1/#C_8 [3] http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Nmtoken
Danny B.
2008/12/25 Danny B. Wikipedia.Danny.B@email.cz:
I have reverted both revisions in r45021 and r45022 because it caused massive invalidity of pages.
The id attribute in XHTML is defined as /[A-Za-z][A-Za-z0-9:_.-]*/ [1][2]- and that's exactly why we use <a name="..."> in cases where the following <h#> tag doesn't have id because of these restrictions. The name attribute is of NMTOKEN [3] thus allows to begin with non-alpha character.
[1] http://www.w3.org/TR/html401/types.html#type-name [2] http://www.w3.org/TR/xhtml1/#C_8 [3] http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Nmtoken
As your own link [1] says, names in HTML4 are already only allowed to begin with a letter. See also your own link [2]: "the type of the name attribute has been changed to NMTOKEN. This attribute is constrained such that it can only have the same values as type ID". Anything that's valid in an a element's name attribute in XHTML 1 is valid as an XML ID.
Note that all section names are passed through Sanitizer::escapeId() right now. They should be passed with the Sanitizer::NONE option (that's poorly named, isn't it?) to prepend an "x" if they don't start with a letter, but currently they aren't, I guess for backward compatibility. That could probably be changed without much harm (add the flag to line 3618 in Parser.php).
What page, specifically, exhibited a regression in standards compliance, and why do you think that it was actually a regression? Did new validator failures occur? I ran a test page through a validator:
http://validator.w3.org/check?uri=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FUser%3ASimetrical%2FAnchor_name_test&charset=(detect+automatically)&doctype=Inline&group=0
The validator also says (emphasis added) "id *AND NAME* attributes must begin with a letter, not a digit."
Okay, Danny and I have argued about this on IRC for the past 45 minutes and still don't agree. My position is as follows. XHTML1 is a reformulation of HTML4 and does not change it except to make it XML-conformant, or as otherwise noted. HTML4 clearly states that id and name attributes must meet exactly the same production: in particular, they must start with a letter.
Now, the XHTML DTD doesn't say this. According to the XHTML1 Transitional DTD[1], the name attribute is of type NMTOKEN. This production can, in particular, begin with non-letters[2]. On the other hand, according to the exact same DTD, the id attribute is of type ID, and this can also begin with non-letters[3]. So it seems to me that if you think the DTD takes precedence over the productions given in HTML4, both id *and* name attributes can begin with non-letters (although the exact class of characters they can begin with varies).
However, the DTD does not in any XML standard specify the full restrictions on the type of content allowed. Additional restrictions may be added in the prose of the specification. XML that conforms to the DTD is valid, but still might be non-conformant. In the particular case, it's not clear from the normative text whether the constraints from HTML4 are supposed to be maintained. In the informative appendix C.8 of XHTML1[4], it says:
"Further, since the set of legal values for attributes of type ID is much smaller than for those of type CDATA, the type of the name attribute has been changed to NMTOKEN. This attribute is constrained such that it can only have the same values as type ID, or as the Name production in XML 1.0 Section 2.3, production 5. Unfortunately, this constraint cannot be expressed in the XHTML 1.0 DTDs."
In other words, the NMTOKEN type was only selected because it was the closest possible match. In fact, the intent seems to have been to weaken the HTML4 restrictions on both name and id, and allow a wider selection of characters to be used (the NAME production from XML is considerably laxer than [a-zA-Z][a-zA-Z0-9_:.-]* from HTML4). However, it also implies that id and name adhere to exactly the same standards, and in particular, neither one may start with a digit, hyphen, etc. But this isn't stated anywhere normative that I can find.
On the other hand, it's recommended that XHTML1 documents that are intended to be backward-compatible restrict themselves to the HTML4 definitions anyway:[4]
"Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information."
In fact, this is what we currently try to do (see Sanitizer::escapeId()). I've always considered the fact that we allow section anchors that start with non-letters to be a bug. But I don't think we gain anything in conformance from not adding anchors to id's as well: the requirements for both are the same. Also, in section 4.10[5], the standard says:
"In order to ensure that XHTML 1.0 documents are well-structured XML documents, XHTML 1.0 documents MUST use the id attribute when defining fragment identifiers on the elements listed above."
According to which we're non-conformant for *not* having an id attribute on the <a> element. (This is in an informative section but uses the word "MUST", so I have no idea what that's supposed to mean.)
But above all, Tidy has been adding the id attribute to <a>'s on Wikipedia for years and still will with or without this change. If something is wrong with that, this reversion is not helping anything. The fix should be to change what we're outputting for anchors in the first place. For the time being, we should keep the change so that we're at least doing things consistently with or without Tidy. If it's wrong, fix it for everyone, not just everyone not using Tidy.
[1] http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Transitional [2] http://www.w3.org/TR/REC-xml/#NT-Nmtoken [3] http://www.w3.org/TR/REC-xml/#NT-Name, http://www.w3.org/TR/REC-xml/#id [4] http://www.w3.org/TR/xhtml1/#C_8 [5] http://www.w3.org/TR/xhtml1/#h-4.10
On 12/25/08 4:32 AM, Danny B. wrote:
I have reverted both revisions in r45021 and r45022 because it caused massive invalidity of pages.
Given that we've been outputting these as "id" attributes for the last few years already (as output by Tidy), I have reverted your revert in r45044 pending further discussion.
-- brion
------------ Původní zpráva ------------ Od: Brion Vibber brion@wikimedia.org Předmět: Re: [Wikitech-l] Anchors haven't id attribute Datum: 26.12.2008 06:30:00
On 12/25/08 4:32 AM, Danny B. wrote:
I have reverted both revisions in r45021 and r45022 because it caused massive
invalidity of pages.
Given that we've been outputting these as "id" attributes for the last few years already (as output by Tidy), I have reverted your revert in r45044 pending further discussion.
-- brion
Well, the id was added _only_ to those tags, where name was transferable to id - thus had to start with ASCII letter. _Never_ to those, which did not conform this rule (the regexp mentioned in my previous post). Easily provable by either running older revision of MediaWiki or testing in Tidy directly:
Take this code excerpt (and wrap it with minimal XHTML document stuff) and run it through Tidy:
<a name="X"></a><h2> <span class="mw-headline"> X </span></h2> <a name="1X"></a><h2> <span class="mw-headline"> 1X </span></h2> <a name=".C3.81X"></a><h2> <span class="mw-headline"> ÁX </span></h2> <a name="-X"></a><h2> <span class="mw-headline"> -X </span></h2>
The result will be:
<a name="X" id="X"></a><h2><span class="mw-headline">X</span></h2> <a name="1X"></a><h2><span class="mw-headline">1X</span></h2> <a name=".C3.81X"></a><h2><span class="mw-headline">ÁX</span></h2> <a name="-X"></a><h2><span class="mw-headline">-X</span></h2>
Now, let me repeat, how the "id" is defined:
1: XHTML is reformulation of HTML 4 as an XML 1.0 application. 2: That means it takes every single definition from HTML 4 and keeps it unless it is overriden in XHTML. 3: The id and name has been defined in HTML 4 as /[A-Za-z][A-Za-z0-9:_.-]*/ [1] [2] 4: The name has been redefined to NMTOKEN [2] [3] 5: The id has never been redefined thus stays on definition mentioned in point 3 above.
This is how the id in XHTML was always handled since the XHTML is out. I also think that such important thing like handling of id is, was fixed in validator during so many years if it wasn't correct.
So currently, all non-latin-chars wikis are now totally invalid according to W3C validator. Major parts of non-ASCII-chars wikis are invalid as well. Therefore is very hard to find other invalid mistakes in code when having worthless positives on every other page. :-(
Also one thing at the end: I think that the current rendering with controversial ids brought more negatives (such as much lowering down the ability to find the real invalid parts of the code) than positives - well, it was working correctly before, so what benefit it actually brought? On the other hand it brought this controversy.
I take the point that I (and majority of people over the world, the validator, Tidy and so many other tools etc.) _may_ be wrong with the interpretation of definition of id. But I guess unless the authority tools, as validator or Tidy are, are fixed in this issue - thus can be proved we render the page correctly - we should not render that way. As I mentioned above - it was working correctly before so there is no urge to force the new rendering since it is not correcting any mistake or misfunctionality.
[1] http://www.w3.org/TR/html401/types.html#type-name [2] http://www.w3.org/TR/xhtml1/#C_8 [3] http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Nmtoken
Kind regards
Danny B.
[snip]
Maybe we should just fix the normalization function the way we'd already planned to, so that it'll work right the way we'd already planned to?
-- brion
On Sat, Dec 27, 2008 at 3:14 PM, Brion Vibber brion@wikimedia.org wrote:
[snip]
Maybe we should just fix the normalization function the way we'd already planned to, so that it'll work right the way we'd already planned to?
Done in r45109. I notice, by the way, that HTML5 allows any string not containing whitespace for id's . . . yet another case where it clearly wins the "don't gratuitously cause pain to developers" contest.
------------ Původní zpráva ------------ Od: Aryeh Gregor Simetrical+wikilist@gmail.com Předmět: Re: [Wikitech-l] Anchors haven't id attribute Datum: 28.12.2008 01:07:08
On Sat, Dec 27, 2008 at 3:14 PM, Brion Vibber brion@wikimedia.org wrote:
[snip]
Maybe we should just fix the normalization function the way we'd already planned to, so that it'll work right the way we'd already planned to?
Done in r45109. I notice, by the way, that HTML5 allows any string not containing whitespace for id's . . . yet another case where it clearly wins the "don't gratuitously cause pain to developers" contest.
*sigh*
Why do we have to hunt for some other solution when we have fully working, fully valid and fully intuitive one?
OK, let's make some summary about three versions we have:
Terms used: - old version - the for-many-years used version until r44896 - mid version - r44896 way - new version - r45109 way
Old version was used for many years. It was fully valid - ids were only there where they could have been copied from name AND comply to the regexp mentioned in previous posts. It has been done automatically by Tidy. And it was fully intuitive - you just wrote [[#Foo]] and it linked to section named Foo. Or you've added #Foo in URL in address bar and you got to the proper section as well. And it was fully working properly.
The mid version brought the "feature" that all name attributes have been duplicated to ids. That caused massive invalidity of pages, especially non-latin and non-ASCII. However, the intuitivity of anchors creation has still been kept.
The new version prepends x to all anchors to solve the problem which was spread here in mid version - the massive invalidity of pages. So it solved one problem (which actually didn't have to be solved if we kept the old version) but brought at least two major other: First major problem is, that this change is breaking millions of existing links to sections. Links used on pages on wikis, links used on external sites, links in people's bookmarks, in emails, forum threads etc. Well, OK, let's discount all external stuff, since we don't have any influence on it, but we still have millions of links left on our own wikis which won't work anymore since r45109. The other major problem is, that since this point further the anchor links are no longer intuitive - we are now pushing people to constantly think about prepending x when creating anchor links. No more simple copy pasting of the headline. As a side effect we are now adding unnecessary work to people from non-latin wikis by pushing them to always switch to latin keyboard, or to click on edittools or whatever just to get the one "x" character in editbox to create the anchor link.
So let me summarize in points: * First we did not have any problem at all. * Second we had one problem. * Third we "solved" the problem but created at least two new. I am pretty scared what's coming next... :-/
One question for the end: What is the benefit of either mid or new version over the old one - what new functionality or feature it brings or which existing bug it fixes?
Kind regards
Danny B.
2008/12/27 Danny B. Wikipedia.Danny.B@email.cz:
*sigh*
Why do we have to hunt for some other solution when we have fully working, fully valid and fully intuitive one?
Because:
1) Our previous behavior arguably violated the XHTML 1 specification by allowing name attributes to begin with nonletters. Please don't ignore this argument because you think it's wrong. I think you're wrong on this issue too, but I don't just ignore your opinion when discussing what the software that we *both* develop should do. Note "arguably" in the first sentence here -- your opinion counts as much as mine.
2) It's not arguable at all that the XHTML 1 specification strongly recommends that <a> elements with a name attribute also have an id attribute. In fact, section 4.10 states: "In order to ensure that XHTML 1.0 documents are well-structured XML documents, XHTML 1.0 documents MUST use the id attribute when defining fragment identifiers on the elements listed above [including <a>]."
I'm not saying these reasons outweigh the reasons against, but those are the reasons it was done. In particular, I don't think I've seen an argument from you against (2).
Old version was used for many years. It was fully valid
Could you *please* stop pretending that a debate doesn't even exist here? It's obnoxious and uncivil, and you keep on doing it.
First major problem is, that this change is breaking millions of existing links to sections. Links used on pages on wikis, links used on external sites, links in people's bookmarks, in emails, forum threads etc. Well, OK, let's discount all external stuff, since we don't have any influence on it, but we still have millions of links left on our own wikis which won't work anymore since r45109.
First of all, all auto-generated internal links (in TOCs) will automatically switch to the new format. Second of all, it should be one extra line of code to fix up all manually-created internal links as well, so that the x is automatically added as part of the encoding process. (I didn't find where this needed to be done at a quick glance.) So we're only talking about external links here.
This is a one-time cost and I don't think it's a big problem -- at worst, a few users will end up on the wrong part of the page. It should be pointed out that this will affect *all* section links on non-Latin wikis (since they get encoded to begin with dots and then need to start with a letter), but again, only as a one-time cost, and only external links (links from external sites or links using external link syntax), and it will still get viewers to almost the right place.
The other major problem is, that since this point further the anchor links are no longer intuitive - we are now pushing people to constantly think about prepending x when creating anchor links. No more simple copy pasting of the headline. As a side effect we are now adding unnecessary work to people from non-latin wikis by pushing them to always switch to latin keyboard, or to click on edittools or whatever just to get the one "x" character in editbox to create the anchor link.
Again, not an issue if internal links are fixed to work correctly. I didn't think about that aspect, but it should be very simple to fix (I'd do it now except I'm going to bed).
It seems to me that there are only weak reasons in favor (following recommended best practice with no practical effect) and only weak reasons against (small one-time transition cost -- unless you're correct that there will be longer-term costs, in which case please clarify why you think this). Normally I would say that standards compliance by itself (as opposed to standards compliance that brings concrete benefit) is worth small one-time costs, although not large enough one-time costs and probably not even fairly small recurring costs. So as it stands, without further arguments, I'd still be weakly in favor of keeping the current state of trunk, of course with the fix for anchors on internal links.
On Sat, Dec 27, 2008 at 10:15 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
Again, not an issue if internal links are fixed to work correctly. I didn't think about that aspect, but it should be very simple to fix (I'd do it now except I'm going to bed).
Done in r45116. Internal links will all now work correctly. I also ensured that invalid manually-specified id's like <span id="0"> get the same treatment (a change that should possibly be kept even if we go back to no id's for section headers).
------------ Původní zpráva ------------ Od: Aryeh Gregor Simetrical+wikilist@gmail.com Předmět: Re: [Wikitech-l] Anchors haven't id attribute Datum: 28.12.2008 04:33:24
On Sat, Dec 27, 2008 at 10:15 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
Again, not an issue if internal links are fixed to work correctly. I didn't think about that aspect, but it should be very simple to fix (I'd do it now except I'm going to bed).
Done in r45116. Internal links will all now work correctly. I also ensured that invalid manually-specified id's like <span id="0"> get the same treatment (a change that should possibly be kept even if we go back to no id's for section headers).
I really don't feel comfortable that instead of discussion you continue to push your like-to-be solution. :-( There is apparently pretty big difference in views on this issue thus nothing should be changed until the discussion will have some products which it does not atm. This is definitely neither constructive nor cooperative approach... :-(
Danny B.
http://www.w3.org/TR/xhtml1/ C.8. Fragment Identifiers "Finally, note that XHTML 1.0 has deprecated the name attribute of the a, applet, form, frame, iframe, img, and map elements, and it will be removed from XHTML in subsequent versions."
and
"Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information."
ouch!
Hoi, When only these strings are to be used, it is a technique that is broken for a large part of our Wikis. Thanks, GerardM
2008/12/29 Tei oscar.vives@gmail.com
http://www.w3.org/TR/xhtml1/ C.8. Fragment Identifiers "Finally, note that XHTML 1.0 has deprecated the name attribute of the a, applet, form, frame, iframe, img, and map elements, and it will be removed from XHTML in subsequent versions."
and
"Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information."
ouch!
--
ℱin del ℳensaje. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 12/29/08 1:17 AM, Tei wrote:
http://www.w3.org/TR/xhtml1/ C.8. Fragment Identifiers "Finally, note that XHTML 1.0 has deprecated the name attribute of the a, applet, form, frame, iframe, img, and map elements, and it will be removed from XHTML in subsequent versions."
Ah, but XHTML is going the way of the dodo -- XHTML 2.0 is a magnificent flop that nobody's touching with a 100-foot pole.
For future-proofing we should pay more attention to the HTML 5 working group stuff.
"Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information."
As noted in the other thread, this is not a normative requirement, and given the actual behavior of browsers doesn't appear to be required in practice either.
-- brion
On Mon, Dec 29, 2008 at 1:18 PM, Brion Vibber brion@wikimedia.org wrote:
Ah, but XHTML is going the way of the dodo -- XHTML 2.0 is a magnificent flop that nobody's touching with a 100-foot pole.
For future-proofing we should pay more attention to the HTML 5 working group stuff.
HTML5 has removed the name attribute of <a> elements, as far as I can see. It will probably end up requiring implementers to support it but prohibiting authors from using it -- that's their solution to backward-compatibility cruft (which is a lot better than the XHTML one of pretending no web pages from 1995 still exist and/or having a two-tier system where everyone ignores the Strict tier).
On Mon, Dec 29, 2008 at 10:20 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
On Mon, Dec 29, 2008 at 1:18 PM, Brion Vibber brion@wikimedia.org wrote:
Ah, but XHTML is going the way of the dodo -- XHTML 2.0 is a magnificent flop that nobody's touching with a 100-foot pole.
For future-proofing we should pay more attention to the HTML 5 working group stuff.
HTML5 has removed the name attribute of <a> elements, as far as I can see. It will probably end up requiring implementers to support it but prohibiting authors from using it -- that's their solution to backward-compatibility cruft (which is a lot better than the XHTML one of pretending no web pages from 1995 still exist and/or having a two-tier system where everyone ignores the Strict tier).
So... I must move my money from the XHTML bank to the HTML 5 bank? MediaWiki seems to output XHTML just now, that it all seems the very best option now. What? will mediawiki evolve to HTML5?
Also, another HTML? why? the idea of "broken code must render anyway" is riped. It must die, a painfull dead, because is the father and mother of the tag soup, that is more vile than the Borg and Microsoft *combined*
On Mon, Dec 29, 2008 at 7:24 PM, Tei oscar.vives@gmail.com wrote:
So... I must move my money from the XHTML bank to the HTML 5 bank? MediaWiki seems to output XHTML just now, that it all seems the very best option now.
We output XHTML1, which is effectively HTML4 with some syntactic differences. Except we don't always output valid XHTML1, of course, even according to the (sometimes too narrow) standards of the W3C validator.
What? will mediawiki evolve to HTML5?
I hope so. I've been meaning to write up an argument that we should begin explicitly moving in that direction in the immediate future.
Also, another HTML? why?
Because XHTML beyond version 1 added various features people didn't want, removed features they did want, and didn't add many new features that were actually useful, so nobody was or is interested in either using it or implementing it. Furthermore, it didn't (and doesn't) account for backward compatibility in the fashion that implementers demand it be accounted for.
HTML5 adds many incredibly cool and useful features, removes a bunch of baseless restrictions and technicalities (it's now possible for a "hello world" document to take less than ten lines!), is focused on the needs of modern websites, is standardizing a huge amount of stuff that was formerly undocumented, and has taken the explicit stance that nothing will make it into the final version of the spec unless it has multiple interoperable implementations.
the idea of "broken code must render anyway" is riped. It must die, a painfull dead, because is the father and mother of the tag soup, that is more vile than the Borg and Microsoft *combined*
Browser vendors are not willing to remove support for it, because it would break old websites. HTML5 says broken code is invalid, but aims to standardize in great detail how browsers should render it anyway, instead of demanding (impractically) that they throw up their arms and die like XHTML insists on. A "feature" of XHTML that practically everyone skips in practice by serving it as text/html.
Of course, HTML5 also changes things so that a lot of code that's unambiguous and that XHTML would declare invalid is actually valid. For instance, this is a completely valid HTML5 document:
<!DOCTYPE HTML> <p>Hello world!
There is absolutely nothing ambiguous or broken about that, and (leaving aside the doctype) it was always valid HTML before XHTML came along.
Aryeh Gregor wrote:
the idea of "broken code must render anyway" is riped. It must die, a painfull dead, because is the father and mother of the tag soup, that is more vile than the Borg and Microsoft *combined*
Browser vendors are not willing to remove support for it, because it would break old websites. HTML5 says broken code is invalid, but aims to standardize in great detail how browsers should render it anyway, instead of demanding (impractically) that they throw up their arms and die like XHTML insists on. A "feature" of XHTML that practically everyone skips in practice by serving it as text/html.
Even if you want to use it, "some browsers" don't support it so you end with ugly user-agent sniffing hacks.
Aryeh Gregor wrote:
First of all, all auto-generated internal links (in TOCs) will automatically switch to the new format. Second of all, it should be one extra line of code to fix up all manually-created internal links as well, so that the x is automatically added as part of the encoding process. (I didn't find where this needed to be done at a quick glance.) So we're only talking about external links here.
This is a one-time cost and I don't think it's a big problem -- at worst, a few users will end up on the wrong part of the page. It should be pointed out that this will affect *all* section links on non-Latin wikis (since they get encoded to begin with dots and then need to start with a letter), but again, only as a one-time cost, and only external links (links from external sites or links using external link syntax), and it will still get viewers to almost the right place.
A point I haven't seen anyone bring up yet is that, if we're messing with section anchors anyway, this would be a good time to try to make them less likely to collide with other IDs on the page.
For example, now that we've forced all section anchors to start with an ASCII letter, we could go one step further and force them to start with an _uppercase_ ASCII letter. Since most if not all of our other ID attributes start with a lowercase letter, this would conveniently place the two in separate namespaces.
Of course, other disambiguation methods would also be possible, but the uppercase trick strikes me as having minimal impact -- in particular, it just happens that most existing section headings on most of our Latin-alphabet projects (including by far the largest one, en.wikipedia) _already_ begin with a capital letter, and therefore will not be affected. Indeed, this seems to hold even for our biggest project using fully case-sensitive page names, Wiktionary.
On Sun, Dec 28, 2008 at 9:42 AM, Ilmari Karonen nospam@vyznev.net wrote:
A point I haven't seen anyone bring up yet is that, if we're messing with section anchors anyway, this would be a good time to try to make them less likely to collide with other IDs on the page.
For example, now that we've forced all section anchors to start with an ASCII letter, we could go one step further and force them to start with an _uppercase_ ASCII letter. Since most if not all of our other ID attributes start with a lowercase letter, this would conveniently place the two in separate namespaces.
See bug 10721. Id's should not differ only in case: this apparently causes problems with IE, which treats them case-insensitively, and is also allegedly required or recommended by XHTML (a comment in Parser.php on line 3619 says this, but I can't find it in the spec).
I would take two steps to ensure that conflicts are avoided:
1) Prohibit user-provided id's from beginning with "mw-". Since all new id's should have been using this for a long time now, it should cut out a lot of possibility for conflict. Also add in other prefixes that are used, like "ca-", "p-", "n-", "page-".
2) Start making a list of all the old, deprecated style of id's that we use, and manually check against all of them with in_array(). We can start with some of the really common ones like "content" and let people create a more comprehensive list if they can be bothered, or as specific conflicts arise.
(2) is kind of ugly, but there doesn't seem to be any better way to do it.
Now, how about figuring out how to get manually-specified id's like <span id="foo"> to be unique? :)
On Sun, Dec 28, 2008 at 10:46 AM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
cut out a lot of possibility for conflict. Also add in other prefixes that are used, like "ca-", "p-", "n-", "page-".
(Actually, I think "page-" is only used for classes, but you get the point.)
Aryeh Gregor wrote:
See bug 10721. Id's should not differ only in case: this apparently causes problems with IE, which treats them case-insensitively, and is also allegedly required or recommended by XHTML (a comment in Parser.php on line 3619 says this, but I can't find it in the spec).
Well, that rather puts a damper on that one, then. Too bad. :(
I would take two steps to ensure that conflicts are avoided:
- Prohibit user-provided id's from beginning with "mw-". Since all
new id's should have been using this for a long time now, it should cut out a lot of possibility for conflict. Also add in other prefixes that are used, like "ca-", "p-", "n-", "page-".
A simpler, though more heavy-handed, approach might be simply to disallow dashes in section anchors (presumably replacing them with ".2D"). This would leave any dashed IDs free for other uses.
- Start making a list of all the old, deprecated style of id's that
we use, and manually check against all of them with in_array(). We can start with some of the really common ones like "content" and let people create a more comprehensive list if they can be bothered, or as specific conflicts arise.
We might be able to adapt the existing code in the parser that prevents duplicate section anchors from occurring: just prepopulate the list of already seen anchors with any IDs that occur in skins.
Now, how about figuring out how to get manually-specified id's like <span id="foo"> to be unique? :)
The big problem there is that there's no sensible way to disambiguate them. Probably the best we can do would be to silently drop any duplicates. Or maybe replace them with a really in-your-face "disambiguation" like OMG-DUPLICATE-ID-whatever-FIX-IT-YOU-IDIOT-1. :)
Anyway, this kind of sounds like something Tidy ought to be doing. Doesn't it?
Ilmari Karonen wrote:
Now, how about figuring out how to get manually-specified id's like <span id="foo"> to be unique? :)
The big problem there is that there's no sensible way to disambiguate them. Probably the best we can do would be to silently drop any duplicates. Or maybe replace them with a really in-your-face "disambiguation" like OMG-DUPLICATE-ID-whatever-FIX-IT-YOU-IDIOT-1. :)
Anyway, this kind of sounds like something Tidy ought to be doing. Doesn't it?
May this be time to propose again a 'validation' mode / debug output?
Somewhere to place that kind of warnings: *Skipping duplicate id 'foo' *Adding </div> for you *Inexistant attribute foo for span removed *Too many templates added. Stopping template expansion etc.
It could be added to the html comment with preprocessor data, or perhaps as a new box in preview mode. In no way it is expected to appear in the page viewing, nor I am saying that we should [[Execution by burning|burn]] (yet) people not writing 100% conformant wiki-text. Also, could be collapsed by default to not discourage new comers. The point is that getting a better insight on how mediawiki is "helping" would be most valuable when studying why a [[Template:Esoteric|esoterism]] isn't working.
On Sun, Dec 28, 2008 at 1:40 PM, Ilmari Karonen nospam@vyznev.net wrote:
A simpler, though more heavy-handed, approach might be simply to disallow dashes in section anchors (presumably replacing them with ".2D"). This would leave any dashed IDs free for other uses.
That's pretty heavy-handed indeed. It's awfully arbitrary to demand all our users use camel-case or underscores for their id's.
We might be able to adapt the existing code in the parser that prevents duplicate section anchors from occurring: just prepopulate the list of already seen anchors with any IDs that occur in skins.
That wouldn't help id's that are manually specified, though. We don't want those to conflict either, do we? Maybe we do, if the interface element doesn't appear on the particular page . . . you might legitimately want to fake some interface element's style in some cases, I guess. Doesn't seem like we should encourage that, though, it could lead to abuse. (Beansy question: are there any places where id's are used to do something in JavaScript, that people could exploit by adding wikitext with those id's?)
The big problem there is that there's no sensible way to disambiguate them. Probably the best we can do would be to silently drop any duplicates. Or maybe replace them with a really in-your-face "disambiguation" like OMG-DUPLICATE-ID-whatever-FIX-IT-YOU-IDIOT-1. :)
Anyway, this kind of sounds like something Tidy ought to be doing. Doesn't it?
IIRC, Tidy outputs some comment like "Serious XHTML error" when it encounters duplicate id's, exactly because in the general case it's scary to try fixing those. If we're really going to police this in the software rather than just declaring it PEBKAC (and I'm not sure we should), we'd need to implement logging first for a while and make sure the more egregious cases are fixed before messing everything up.
I think it's probably okay the way we have it now. You can mess things up if you manually specify id's, but that's your problem. I could see the argument that only sysops should be allowed to create invalid markup, though (which they inevitably can in JS and CSS, and in HTML via JS).
On Sun, Dec 28, 2008 at 2:50 PM, Platonides Platonides@gmail.com wrote:
May this be time to propose again a 'validation' mode / debug output?
Somewhere to place that kind of warnings: *Skipping duplicate id 'foo' *Adding </div> for you *Inexistant attribute foo for span removed *Too many templates added. Stopping template expansion etc.
It could be added to the html comment with preprocessor data, or perhaps as a new box in preview mode. In no way it is expected to appear in the page viewing, nor I am saying that we should [[Execution by burning|burn]] (yet) people not writing 100% conformant wiki-text. Also, could be collapsed by default to not discourage new comers. The point is that getting a better insight on how mediawiki is "helping" would be most valuable when studying why a [[Template:Esoteric|esoterism]] isn't working.
If we did this, we'd probably want multiple channels of output, debug and warning at least. Warnings should only be raised when what the user intended is clearly not happening: when duplicate id's or nonexistent attributes are removed, for instance, or template expansion is halted. Automatically closing tags isn't something that should be mentioned unless the user specifically asks for that level of nitpicking, if even then. Wikitext doesn't need to aspire to be XML.
Aryeh Gregor wrote:
It could be added to the html comment with preprocessor data, or perhaps as a new box in preview mode. In no way it is expected to appear in the page viewing, nor I am saying that we should [[Execution by burning|burn]] (yet) people not writing 100% conformant wiki-text. Also, could be collapsed by default to not discourage new comers. The point is that getting a better insight on how mediawiki is "helping" would be most valuable when studying why a [[Template:Esoteric|esoterism]] isn't working.
If we did this, we'd probably want multiple channels of output, debug and warning at least. Warnings should only be raised when what the user intended is clearly not happening: when duplicate id's or nonexistent attributes are removed, for instance, or template expansion is halted. Automatically closing tags isn't something that should be mentioned unless the user specifically asks for that level of nitpicking, if even then. Wikitext doesn't need to aspire to be XML.
We could add a debug level to each warning. Wikitext doesn't need to be strict. But the wiki being intelligent about it is not an excuse for not writing it right the first time. Unclosed tags is the kind of thing that could completely break the page display if unhandled/mishandled by the tidier. The whole thing should be optional (with several debug levels/independent features to activate) but in whole provides a number of benefits:
*New users can learn better wikitexting by viewing the complains. *Experienced users can see their mistakes with it *Advanced users can use it to debug the unexpected results. *Developers can take advantage of it to deprecate some syntaxes *...with the same debugging facility to log how much proposed changes are being used. *...and moves towards defining the official dialect
and of course it must me easily ignorable to not dicourage new contributors.
I'm seeing now bug 16038 (it has just been edited), that's the kind of thing that could be placed on the debug output.
Aryeh Gregor wrote:
On Sun, Dec 28, 2008 at 1:40 PM, Ilmari Karonen nospam@vyznev.net wrote:
A simpler, though more heavy-handed, approach might be simply to disallow dashes in section anchors (presumably replacing them with ".2D"). This would leave any dashed IDs free for other uses.
That's pretty heavy-handed indeed. It's awfully arbitrary to demand all our users use camel-case or underscores for their id's.
I was mostly thinking about enforcing this for section anchors only.
It seems to me there are (at least) three types of IDs that we would rather not see conflict:
1. IDs generated and used by MediaWiki skins (and other parts? extensions?).
2. IDs specified by users in wikitext (used either for site custom JS/CSS or, since MediaWiki doesn't allow <a name="..."> in wikitext, for manually specifying link anchors).
3. Section anchors, which, for whatever reason known only to the W3C, share a namespace with IDs.
Ideally, we'd like to keep all three apart, although the use of custom IDs as link anchors (e.g. to preserve old anchor names when section headings change) suggests that it should be at least possible to manually specify IDs that lie in the part of the namespace normally used for section anchors.
We already have code in the parser to prevent collisions _within_ type 3. Collisions within type 1 are bugs in MediaWiki, and should not occur. We currently do absolutely nothing to stop IDs of type 2 from colliding with either each other or with the other types.
it could lead to abuse. (Beansy question: are there any places where id's are used to do something in JavaScript, that people could exploit by adding wikitext with those id's?)
You can grep wikibits.js and other files for getElementById. I don't immediately see anything very serious. Yes, you can royally confuse some scripts by creating elements with IDs like "toc", "column-one", "p-cactions" or "mw-js-message", but those mostly just lead to some missing or misplaced interface elements. Of course, there are plenty more among user and site custom JS.
Hmmm... okay, assume someone is using a user script that adds interface elements using addPortletLink(), AND assume those interface elements do something potentially significant (like automatically editing a page) without prompting for confirmation, AND assume a malicious user manages to "hijack" the portlet ID, so that those interface elements get inserted within the page content rather than in their proper places, without the target user noticing, AND assume they further manage to hide said misplaced interface elements with CSS AND also manage to convince the target user to click on the hidden link.
We _could_ be looking at a potential clickjacking vulnerability here.
The obvious fix, while we still let duplicate IDs through, would be to modify addPortletLink() so that it refuses to insert links within the page content.
On 12/28/08, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
Now, how about figuring out how to get manually-specified id's like <span id="foo"> to be unique? :)
Could just number them sequentially like we do the section headings.
...or try to convince the creators of id="stub" templates that there's something wrong with giving them all the same id (good luck with that, they wouldn't listen to me).
—C.W.
On Tue, Dec 30, 2008 at 10:17 AM, Charlotte Webb charlottethewebb@gmail.com wrote:
Could just number them sequentially like we do the section headings.
Alternatively, we could strip them and add an HTML comment -- we need to have some id for the extra section headings, because we need to link to them automatically, but that seems to be of dubious benefit for user-supplied id's. If they're using them for anything, their use (e.g., CSS rule, getElementById()) will almost certainly fail if we modify the id in any way, so no point in keeping it at all.
...or try to convince the creators of id="stub" templates that there's something wrong with giving them all the same id (good luck with that, they wouldn't listen to me).
Well, if we wanted to, telling them "in one week it will stop working" should do it. Or just having a sysop there do it unilaterally with that as edit summary. I don't know if we want to, though. It would be disruptive for questionable benefit. As I said, if we do this we'd have to do a dry run for a while and only log conflicts, and deal with all the major problem-causers before enabling it for real.
------------ Původní zpráva ------------ Od: Aryeh Gregor Simetrical+wikilist@gmail.com Předmět: Re: [Wikitech-l] Anchors haven't id attribute Datum: 28.12.2008 04:16:43
2008/12/27 Danny B. Wikipedia.Danny.B@email.cz:
*sigh*
Why do we have to hunt for some other solution when we have fully working,
fully valid and fully intuitive one?
Because:
- Our previous behavior arguably violated the XHTML 1 specification
by allowing name attributes to begin with nonletters. Please don't ignore this argument because you think it's wrong. I think you're wrong on this issue too, but I don't just ignore your opinion when discussing what the software that we *both* develop should do. Note "arguably" in the first sentence here -- your opinion counts as much as mine.
Not true. XHTML 1 was NOT violated. Name attributes in XHTML1 CAN begin with any allowed char (letter, num and some set of punctation). They DO NOT have to begin with letter. Name attributes in XHTML1 are of NMTOKEN which is defined that way. You did not provide any evidence of violating of XHTML. On the other hand I've provided the link to specification confirming what I said. Also W3C validator results confirm my words.
- It's not arguable at all that the XHTML 1 specification strongly
recommends that <a> elements with a name attribute also have an id attribute. In fact, section 4.10 states: "In order to ensure that XHTML 1.0 documents are well-structured XML documents, XHTML 1.0 documents MUST use the id attribute when defining fragment identifiers on the elements listed above [including <a>]."
I'm not saying these reasons outweigh the reasons against, but those are the reasons it was done. In particular, I don't think I've seen an argument from you against (2).
Strong recommendation does not yet mean it's a must. It is recommended, not required nor enforced. Id is recommended because of well-structure which means to keep unique ids. Name attributes are not required to be unique thus could cause wrong results if two or more names are same - user agent wouldn't know to which one to roll, so it would roll to the first occurence.
Old version was used for many years. It was fully valid
Could you *please* stop pretending that a debate doesn't even exist here? It's obnoxious and uncivil, and you keep on doing it.
Could you *please* stop to turn this discussion into personal area? Thank you. Discuss the topic, not the persons in discussion, please. It doesn't help to anything and raises the temperature unreasonably. Thank you.
I am not pretending anything. The statement above is fully true.
First major problem is, that this change is breaking millions of existing
links to sections. Links used on pages on wikis, links used on external sites, links in people's bookmarks, in emails, forum threads etc. Well, OK, let's discount all external stuff, since we don't have any influence on it, but we still have millions of links left on our own wikis which won't work anymore since r45109.
First of all, all auto-generated internal links (in TOCs) will automatically switch to the new format. Second of all, it should be one extra line of code to fix up all manually-created internal links as well, so that the x is automatically added as part of the encoding process. (I didn't find where this needed to be done at a quick glance.) So we're only talking about external links here.
I was not speaking about TOC. That is obvious that since it's automatically generated,it will be correct. What you mean by "automatically added as part of the encoding process"? Does that mean that if I'll write [[#foo]] it will automatically create the #xfoo anchor? If yes, then you're again simply adding load of work and thinking to users. Since this point further they could not simply copy'n'paste the anchor from address bar to wikitext, because it would prepend another x. You are pushing them to think about if the anchor link should or should not start with "x". Wiki should be simple. In case you think about having the linker automatically decide if to prepend it or not depending on if the [[#......]] text starts with x, let me remind, that random headlines can start with "x" themselves, thus it would confuse the algorithm.
This is a one-time cost and I don't think it's a big problem -- at worst, a few users will end up on the wrong part of the page. It should be pointed out that this will affect *all* section links on non-Latin wikis (since they get encoded to begin with dots and then need to start with a letter), but again, only as a one-time cost, and only external links (links from external sites or links using external link syntax), and it will still get viewers to almost the right place.
Few users? Are Wikipedias used by few users?
You are again not saying the true - section links, if in "name" attribute, DO NOT have to start with letter. You made pages invalid by adding the id attribute with the same value copied from name attribute even if it shouldn't have been done because it's against the specification which requires the letter first.
If I'll write [[Foo#ěščřž]] link on wiki, it is converted to Foo#.C4.9B.C5.A1.C4.8D.C5.99.C5.BE and WORKS properly (takes me to such section) and it IS valid. Same on non-latin wikis. (I'm speaking about the old version before all these discussed changes.)
The other major problem is, that since this point further the anchor links are
no longer intuitive - we are now pushing people to constantly think about prepending x when creating anchor links. No more simple copy pasting of the headline.
As a side effect we are now adding unnecessary work to people from non-latin
wikis by pushing them to always switch to latin keyboard, or to click on edittools or whatever just to get the one "x" character in editbox to create the anchor link.
Again, not an issue if internal links are fixed to work correctly. I didn't think about that aspect, but it should be very simple to fix (I'd do it now except I'm going to bed).
Re the internal links see above. Wiki should be simple and intuitive and copypastable. You'll push people to think more than they need. You will make the anchor links be without any reasonable and necessary reason different than the real section names are. Thus you'll lower down the intuitivity and create the work for people who will have to correct the things they'd normally wrote correctly if it was intuitive. You'll cause headaches to those, who are not so familiar with wiki and just blindly do what they've been taught and it suddenly won't work. You are going to break millions of links. Is it worth it when it is not repairing of any bug, misfunctionality, invalidity or stuff like that? What _concrete_ indisputable and - first of all - necessary benefits it brings over the old version? I haven't heard any yet. But I know about bunch of negatives.
It seems to me that there are only weak reasons in favor (following recommended best practice with no practical effect) and only weak reasons against (small one-time transition cost -- unless you're correct that there will be longer-term costs, in which case please clarify why you think this). Normally I would say that standards compliance by itself (as opposed to standards compliance that brings concrete benefit) is worth small one-time costs, although not large enough one-time costs and probably not even fairly small recurring costs. So as it stands, without further arguments, I'd still be weakly in favor of keeping the current state of trunk, of course with the fix for anchors on internal links.
Of course that arguments of the opponents are always weak ;-) However, even when you brought much less evidence than me, I am not saying the same about yours.
Re the standard compliance - we do comply the standards. (I should actually say we did, because you broke it with adding of invalid ids).
I don't think that re-teaching hundreds of thousands of wiki users how anchor links are going to be treated is "small one-time cost". Besides I also hardly doubt the possible transition will be so smooth as you are presenting here. There are always unexpectable problems. Do we need them? Don't we have the _real_ bugs and misfunctionalities to fix instead of worthless playing on the place where it is not necessary because it works correctly? Do we need to cause unnecessary additional work on software which wouldn't be needed if we kept the old fully working version?
Anyway, I'd suggest you to present the full transition plan which could be discussed rather than doing changes in software that either causes massive invalidity of pages or breaks links or does any other evil. That would help this discussion a lot, thank you.
By the way, I still think, that _if_ the truth about attributes was on your side, Validator and Tidy were fixed in this way ages ago. But they still confirm my words.
I am really very very disappointed that there are problems artifically being created on places where they weren't. Everything was working properly and intuitively. Somebody decided to change it and more and more problems pop-up since then. I'd like to remind two useful principles - KISS and "if it works, don't touch it". It worked, now it does not. It used to be simple, now we have to think about dozens of consequencies. Also I'm a bit confused about the approach to backward compatibility - on one hand we keep in code ancient constructions and structures just for case somebody would have some tool using them, on the other hand we are going to break millions of links. One of the major web principles says Cool URIs don't change. And what about tools? There are indeed tools working with anchor links. So we are going to break them all now?
Kind regards
Danny B.
On Sun, Dec 28, 2008 at 9:32 PM, Danny B. Wikipedia.Danny.B@email.cz wrote:
Not true. XHTML 1 was NOT violated.
I am not going to discuss this any further with you as long as you refuse to accept the fact that there is disagreement on this and that your interpretation is no more important than mine vis-a-vis MediaWiki. If you can't even agree to disagree, there's no point in my talking to you.
What you mean by "automatically added as part of the encoding process"? Does that mean that if I'll write [[#foo]] it will automatically create the #xfoo anchor?
No. If you write [[#foo]] it will create a #foo anchor. If you write #0 it will create a #x0 anchor. Try it out on trunk before you invent baseless objections.
2008/12/28 Danny B. Wikipedia.Danny.B@email.cz:
I really don't feel comfortable that instead of discussion you continue to push your like-to-be solution. :-(
That commit was not pushing anything. Brion set the status quo of names being copied to id's, and given that status quo, I only fixed it so that it didn't produce invalid XHTML, at Brion's suggestion. This is something that I had planned to do for a long time, independent of the current issue (i.e., even if anchors used only the name attribute).
On 28/12/2008, Danny B. Wikipedia.Danny.B@email.cz wrote:
The other major problem is, that since this point further the anchor links are no longer intuitive - we are now pushing people to constantly think about prepending x when creating anchor links. No more simple copy pasting of the headline. As a side effect we are now adding unnecessary work to people from non-latin wikis by pushing them to always switch to latin keyboard, or to click on edittools or whatever just to get the one "x" character in editbox to create the anchor link.
The anchors of non-latin headers are already (latin) gibberish: #.D0.A4.D0.B8.D0.BB.D1.8C.D0.BC.D0.BE.D0.B3.D1.80.D0.B0.D1.84.D0.B8.D1.8F
It doesn't seem reasonable to think that people could create anchors in their head from text, except in special cases.
On Sun, Dec 28, 2008 at 4:32 AM, Niklas Laxström niklas.laxstrom@gmail.com wrote:
The anchors of non-latin headers are already (latin) gibberish: #.D0.A4.D0.B8.D0.BB.D1.8C.D0.BC.D0.BE.D0.B3.D1.80.D0.B0.D1.84.D0.B8.D1.8F
It doesn't seem reasonable to think that people could create anchors in their head from text, except in special cases.
Similar remarks could be made about anchors with markup in them. The markup stripping is not so mindless that I'd expect anyone to be able to do it reliably in their head.
------------ Původní zpráva ------------ Od: Niklas Laxström niklas.laxstrom@gmail.com Předmět: Re: [Wikitech-l] Anchors haven't id attribute Datum: 28.12.2008 10:34:16
On 28/12/2008, Danny B. Wikipedia.Danny.B@email.cz wrote:
The other major problem is, that since this point further the anchor links
are no longer intuitive - we are now pushing people to constantly think about prepending x when creating anchor links. No more simple copy pasting of the headline.
As a side effect we are now adding unnecessary work to people from non-latin
wikis by pushing them to always switch to latin keyboard, or to click on edittools or whatever just to get the one "x" character in editbox to create the anchor link.
The anchors of non-latin headers are already (latin) gibberish: #.D0.A4.D0.B8.D0.BB.D1.8C.D0.BC.D0.BE.D0.B3.D1.80.D0.B0.D1.84.D0.B8.D1.8F
It doesn't seem reasonable to think that people could create anchors in their head from text, except in special cases.
-- Niklas Laxström
I'm sorry, but you are not right. Of course people can create anchors from their head (or usually by simple copypasting of the headline). The "gibberish" you speak about is created/converted automaticaly by software and user does not have to think about it.
If I'll write [[Foo#ěščřž]] link on wiki, it is converted to http://.../Foo#.C4.9B.C5.A1.C4.8D.C5.99.C5.BE and WORKS properly - takes me to such section. Same on non-latin wikis. (I'm speaking about the old version before all discussed changes.)
I do not have to write whose "gibberish" anchor names when I want to link to anchor because anchor links are automatically converted by software from human-readable text to that ASCII stuff.
In old version I could have simply copy the headline and paste it after # within [[...]] brackets and the link has been done. Now I can't simply copypaste,but I have to think about prepending of "x" - which - as I mentioned before - is hella lot unnecessary work for non-latin users who have to switch the keyboard etc. (see above)
Kind regards
Danny B.
wikitech-l@lists.wikimedia.org