1. Parsoid doesn't generate section ids now, but when we do, yes, we'll make sure ids are compatible with core ids and unique. Will check the phpjs library code to see what we are missing. We don't have a ticket for generating section ids yet.

2. At some point, it makes sense to switch both core and Parsoid to a different id scheme (HTML5 ids is the obvious possibility, and Gabriel proposed another) and have fallback support for old-style ids (we've brainstormed some ideas in the past, but I forget the details right now).

Subbu.

On 10/27/2015 05:01 PM, Bernd Sitzmann wrote:
Subbu, Gergo and Gabriel:

Thank you for your comments so far.

Just to be clear, ideally I want the anchor ids to be the same as used in Core. I would really like for Parsoid to provide the same anchor ids as Core does. Then that would take also care of the uniqueness issue. Is there a task for this? If not I'd be happy to create one. In the meantime I'll use my own implementation until we get something from upstream.

If the anchor ids generated by the Mobile Content Service do not match the ones generated by Core then the app would not scroll to the correct section. Instead it would just stay at the top of the page.

The links can come from inside the same page, other pages, redirects, or even from outside the app/site. The app builds the correct <h[2-6]> tags using the anchor values provided by the Mobile Content Service output. This is why I don't want just an anchor id that looks like "mwCA". (Of course, that would be ok if core would do the same but right now it doesn't.)

Subbu: Thanks for the link to the JS code. I'll adapt my patch to include some of the additional substitutions. You may also want to check out my patch since I think some of the cases that are handled by the phpjs library are not handled in the Parsoid code.

Another thing I haven't found in the Parsoid code is ensuring uniqueness of ids. I'd be interested how this is resolved in Core, too, of course, to make sure what we do on the JS matches Core.

Cheers,
Bernd




On Tue, Oct 27, 2015 at 3:10 PM, Gabriel Wicke <gwicke@wikimedia.org> wrote:
Another option could be to use compact stable element IDs not based on the content. This would be less readable, but on the upside there wouldn't be any collisions, and links wouldn't break on minor heading changes.

On Tue, Oct 27, 2015 at 2:04 PM, Subramanya Sastry <ssastry@wikimedia.org> wrote:
On 10/27/2015 03:48 PM, Gergo Tisza wrote:

​If you care about edge cases, section anchor generation is rather complicated: anchors can be postfixed with an index when there are multiple identical titles,

This would need to be handled to guarantee id uniqueness.

and HTML, templates and parser tags are handled differently for display and for anchor generation. (Yes, these can and do appear in titles. E.g. people sometimes put <math> tags in there, or italicize a word.)

But, if we move core and Parsoid to HTML5 ids, this shouldn't matter since the only restriction on HTML5 ids is that they shouldn't contain a space char as per https://html.spec.whatwg.org/multipage/dom.html#the-id-attribute

Subbu.



--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation