On 10/27/2015 01:38 PM, Bernd Sitzmann wrote:
Hi,
I'm building the ToC entries from Parsoid HTML content. Another part
which caused some struggle is building the correct anchors for the
section headings.
In the past, we discussed generating HTML5 ids rather than the munged
ids that are currently generated in mediawiki core (for html4 reasons
that no longer apply to html5). However, we didn't go ahead with it
because at least for old content, we have to generate both old munged
and new ids since a lot of anchors would have escaped out in the wild.
https://gerrit.wikimedia.org/r/#/c/226032/ has some comments about this.
I haven't thought through this, but could mobile generate html5 ids
(which is less restrictive) instead of the html4-style ids? I suppose if
those section links got shared and opened outside the mobile view, they
would break in some cases.
In any case, the escapeId function in
https://gerrit.wikimedia.org/r/#/c/226032/4/lib/ext.core.Sanitizer.js is
the code to generate these ids if it helps (right now unused in Parsoid,
but will be used once we start generating section ids).
Subbu.
First I thought I could just use the id attributes in the heading tags
Parsoid provides[2].
Example from [1]:
<h2 id="mwCA">Template truncation</h2>
But then I thought about links to specific sections. Those would not
use the same ids Parsoid generates.[3] They would use the
anchorencoded tocline strings.[4]
Since I have not found an npm module which does anchorencoding in
JavaScript I wrote a small library function to do the same. It uses
the phpjs npm module to take into account the PHP specific way
URLencoding is done. Would you mind checking the anchorencode.js
file and the associate test file anchorencode-test.js in my patch[5]?
If there is a JS implementation of this I'd be happy to hear about
that, of course.
Thanks,
Bernd
[1]
https://test.wikipedia.org/wiki/Section_edit_links_bug2
[2]
view-source:https://test.wikipedia.org/api/rest_v1/page/html/Section_edit_l…
[3]
https://test.wikipedia.org/api/rest_v1/page/html/Section_links
[4]
https://www.mediawiki.org/wiki/Manual:PAGENAMEE_encoding#Encodings_compared
[5]
https://gerrit.wikimedia.org/r/#/c/246100/7