Okay, Danny and I have argued about this on IRC for the past 45 minutes and still don't agree. My position is as follows. XHTML1 is a reformulation of HTML4 and does not change it except to make it XML-conformant, or as otherwise noted. HTML4 clearly states that id and name attributes must meet exactly the same production: in particular, they must start with a letter.
Now, the XHTML DTD doesn't say this. According to the XHTML1 Transitional DTD[1], the name attribute is of type NMTOKEN. This production can, in particular, begin with non-letters[2]. On the other hand, according to the exact same DTD, the id attribute is of type ID, and this can also begin with non-letters[3]. So it seems to me that if you think the DTD takes precedence over the productions given in HTML4, both id *and* name attributes can begin with non-letters (although the exact class of characters they can begin with varies).
However, the DTD does not in any XML standard specify the full restrictions on the type of content allowed. Additional restrictions may be added in the prose of the specification. XML that conforms to the DTD is valid, but still might be non-conformant. In the particular case, it's not clear from the normative text whether the constraints from HTML4 are supposed to be maintained. In the informative appendix C.8 of XHTML1[4], it says:
"Further, since the set of legal values for attributes of type ID is much smaller than for those of type CDATA, the type of the name attribute has been changed to NMTOKEN. This attribute is constrained such that it can only have the same values as type ID, or as the Name production in XML 1.0 Section 2.3, production 5. Unfortunately, this constraint cannot be expressed in the XHTML 1.0 DTDs."
In other words, the NMTOKEN type was only selected because it was the closest possible match. In fact, the intent seems to have been to weaken the HTML4 restrictions on both name and id, and allow a wider selection of characters to be used (the NAME production from XML is considerably laxer than [a-zA-Z][a-zA-Z0-9_:.-]* from HTML4). However, it also implies that id and name adhere to exactly the same standards, and in particular, neither one may start with a digit, hyphen, etc. But this isn't stated anywhere normative that I can find.
On the other hand, it's recommended that XHTML1 documents that are intended to be backward-compatible restrict themselves to the HTML4 definitions anyway:[4]
"Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4. When defining fragment identifiers to be backward-compatible, only strings matching the pattern [A-Za-z][A-Za-z0-9:_.-]* should be used. See Section 6.2 of [HTML4] for more information."
In fact, this is what we currently try to do (see Sanitizer::escapeId()). I've always considered the fact that we allow section anchors that start with non-letters to be a bug. But I don't think we gain anything in conformance from not adding anchors to id's as well: the requirements for both are the same. Also, in section 4.10[5], the standard says:
"In order to ensure that XHTML 1.0 documents are well-structured XML documents, XHTML 1.0 documents MUST use the id attribute when defining fragment identifiers on the elements listed above."
According to which we're non-conformant for *not* having an id attribute on the <a> element. (This is in an informative section but uses the word "MUST", so I have no idea what that's supposed to mean.)
But above all, Tidy has been adding the id attribute to <a>'s on Wikipedia for years and still will with or without this change. If something is wrong with that, this reversion is not helping anything. The fix should be to change what we're outputting for anchors in the first place. For the time being, we should keep the change so that we're at least doing things consistently with or without Tidy. If it's wrong, fix it for everyone, not just everyone not using Tidy.
[1] http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_XHTML-1.0-Transitional [2] http://www.w3.org/TR/REC-xml/#NT-Nmtoken [3] http://www.w3.org/TR/REC-xml/#NT-Name, http://www.w3.org/TR/REC-xml/#id [4] http://www.w3.org/TR/xhtml1/#C_8 [5] http://www.w3.org/TR/xhtml1/#h-4.10