Subbu, Gergo and Gabriel:
Thank you for your comments so far.
Just to be clear, ideally I want the anchor ids to be the same as used in Core. I would really like for Parsoid to provide the same anchor ids as Core does. Then that would take also care of the uniqueness issue. Is there a task for this? If not I'd be happy to create one. In the meantime I'll use my own implementation until we get something from upstream.
If the anchor ids generated by the Mobile Content Service do not match the ones generated by Core then the app would not scroll to the correct section. Instead it would just stay at the top of the page.
The links can come from inside the same page, other pages, redirects, or even from outside the app/site. The app builds the correct <h[2-6]> tags using the anchor values provided by the Mobile Content Service output. This is why I don't want just an anchor id that looks like "mwCA". (Of course, that would be ok if core would do the same but right now it doesn't.)
Subbu: Thanks for the link to the JS code. I'll adapt my patch to include some of the additional substitutions. You may also want to check out my patch since I think some of the cases that are handled by the phpjs library are not handled in the Parsoid code.
Another thing I haven't found in the Parsoid code is ensuring uniqueness of ids. I'd be interested how this is resolved in Core, too, of course, to make sure what we do on the JS matches Core.
Cheers, Bernd
On Tue, Oct 27, 2015 at 3:10 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
Another option could be to use compact stable element IDs https://phabricator.wikimedia.org/T116350 not based on the content. This would be less readable, but on the upside there wouldn't be any collisions, and links wouldn't break on minor heading changes.
On Tue, Oct 27, 2015 at 2:04 PM, Subramanya Sastry ssastry@wikimedia.org wrote:
On 10/27/2015 03:48 PM, Gergo Tisza wrote:
If you care about edge cases, section anchor generation is rather complicated: anchors can be postfixed with an index when there are multiple identical titles,
This would need to be handled to guarantee id uniqueness.
and HTML, templates and parser tags are handled differently for display
and for anchor generation. (Yes, these can and do appear in titles. E.g. people sometimes put <math> tags in there, or italicize a word.)
But, if we move core and Parsoid to HTML5 ids, this shouldn't matter since the only restriction on HTML5 ids is that they shouldn't contain a space char as per https://html.spec.whatwg.org/multipage/dom.html#the-id-attribute
Subbu.
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation