On 10/27/2015 01:38 PM, Bernd Sitzmann wrote:
Hi,

I'm building the ToC entries from Parsoid HTML content. Another part which caused some struggle is building the correct anchors for the section headings.

In the past, we discussed generating HTML5 ids rather than the munged ids that are currently generated in mediawiki core (for html4 reasons that no longer apply to html5). However, we didn't go ahead with it because at least for old content, we have to generate both old munged and new ids since a lot of anchors would have escaped out in the wild. https://gerrit.wikimedia.org/r/#/c/226032/ has some comments about this.

I haven't thought through this, but could mobile generate html5 ids (which is less restrictive) instead of the html4-style ids? I suppose if those section links got shared and opened outside the mobile view, they would break in some cases.

In any case, the escapeId function in https://gerrit.wikimedia.org/r/#/c/226032/4/lib/ext.core.Sanitizer.js is the code to generate these ids if it helps (right now unused in Parsoid, but will be used once we start generating section ids).

Subbu.


First I thought I could just use the id attributes in the heading tags Parsoid provides[2].

Example from [1]:
<h2 id="mwCA">Template truncation</h2>

But then I thought about links to specific sections. Those would not use the same ids Parsoid generates.[3] They would use the anchorencoded tocline strings.[4]

Since I have not found an npm module which does anchorencoding in JavaScript I wrote a small library function to do the same. It uses the phpjs npm module to take into account the PHP specific way URLencoding is done. Would you mind checking the anchorencode.js
file and the associate test file anchorencode-test.js in my patch[5]?

If there is a JS implementation of this I'd be happy to hear about that, of course.

Thanks,
Bernd

[1] https://test.wikipedia.org/wiki/Section_edit_links_bug2
[2] view-source:https://test.wikipedia.org/api/rest_v1/page/html/Section_edit_links_bug2
[3] https://test.wikipedia.org/api/rest_v1/page/html/Section_links
[4] https://www.mediawiki.org/wiki/Manual:PAGENAMEE_encoding#Encodings_compared
[5] https://gerrit.wikimedia.org/r/#/c/246100/7