Hey everyone,

As the Content Transform Team works towards defaulting to Parsoid for read views on wikis, we have been working to address the long tail of differences that might impact the ecosystem of tools that consume content.


There are a couple of changes in Parsoid’s Cite output that we are rolling out this week and next week. Both of these changes are meant to ensure that code that use CSS selectors based on current read view HTML will continue to work properly with Parsoid’s HTML. But, since there are also tools that are written to consume Parsoid HTML, we also want to ensure that those tools don’t break with these changes.


Changes being rolled out

This week, we are rolling out this change which adds “reference-text” as a CSS class where Parsoid currently emits ”mw-reference-text”. This is a non-breaking change that should not impact Parsoid clients in any way.  Note that “mw-reference-text” is still the preferred class name to use in new code, but temporarily adding the legacy unprefixed class name as well will enhance compatibility with some existing users.


Next week, we are rolling out this change which adds “mw-cite-linkback” class to Parsoid HTML. By itself, that would be a non-breaking change. But, additionally, that patch also adds a <span> wrapper around the back-link HTML for non-named references. This is to match the HTML structure for backlinks in the current readview HTML. However, strictly speaking, this would be a breaking change to Parsoid HTML since altering the DOM tree structure could affect tools written to consume Parsoid HTML. However, we believe that the chance of this breakage is quite remote as explained in the commit message of the linked patch.

In the unlikely scenario that your Parsoid-HTML tool is affected

If any tool depends on a CSS selector like “li > a[“rel=mw:referencedBy”] which might now break because of the intervening span wrapper, you will have to adjust your code to handle the span wrapper. 


The simplest thing to do would be to drop the unnecessary "li > " prefix in the selector. But, if for some reason you cannot, given that the HTML is cached, until the caches roll over (up to 4 weeks), you must be ready to handle HTML with or without the extra span wrapper.  You can look at the data-mw-parsoid-version attribute on the element matching the div.mw-parser-output selector to determine whether these changes are present on a page. If the value of the data-mw-parsoid-version attribute is 0.20.0.0-alpha7 or later, you are working with Parsoid HTML with the additional wrapper span. For Parsoid version 0.20.0.0-alpha6 or earlier, you are with Parsoid HTML without the additional wrapper.

Why are we skipping a major version bump?

Normally, breaking changes to HTML structure would require a major version bump to Parsoid’s HTML version (currently at 2.8.0). However, the content negotiation protocol is currently broken in the RESTBase + core REST API + Parsoid combination. RESTBase is also in the process of active deprecation and removal. So, we feel that we should wait to fix up the content negotiation protocol implementation at least till RESTBase is out of the picture. But, at the same time, we do not want to unduly delay the rollout of Parsoid HTML read views. Given the nature of the change (as noted above), we feel that breakage is extremely unlikely.

-Subbu, on behalf of the Content Transform Team.