The Core Platform and Parsing teams at the Wikimedia Foundation are glad
to announce the implementation of a content negotiation protocol for
Parsoid HTML in the REST API . This was deployed to the Wikimedia
cluster on October 1, 2018.
Parsoid HTML clients can now use the Accept header to specify which
version of content they expect when requesting Parsoid HTML from the
REST API. If omitted, as before, they will get whatever version of the
HTML is in storage, regardless of any breaking changes it may contain.
Parsoid's HTML is versioned
An advantage of Parsoid’s HTML output is that it is both specced and
versioned . By adhering to the principles of semantic versioning ,
Parsoid can signal to clients what kinds of changes can be expected
in the output between versions.
However, until recently, Parsoid always returned the latest version
of its HTML. Naturally, this posed challenges when deploying breaking
changes since clients had to be prepared to consume the newer version.
Rolling out new HTML versions without breaking clients
Throughout its history, Parsoid developers have had close enough contact
with the developers of Parsoid clients (they are internal to the
Wikimedia Foundation for the most part) to coordinate deployment
of breaking changes to the HTML. This mainly involved ensuring all
known clients were forward and backwards compatible with the newer
HTML version before deploying the change. Needless to say, as more
clients were coming along, this informal process would not suffice;
a scalable and predictable version upgrade solution was needed.
Content Negotiation Protocol
To solve this problem, a content negotiation protocol  relying on
HTTP Accept headers was implemented. See RESTBase’s documentation 
for the exact details of the protocol. What follows is just an
Parsoid clients are expected to pass an Accept header that specifies
the HTML version they can handle. If the version present in storage
does not satisfy the request, RESTBase will attempt to resolve the
inconsistency. However, if the requested version cannot be satisfied,
an (HTTP 406) error will be returned. The meaning of “satisfied” here
mostly follows semver’s caret semantics  (the main difference being
that the patch level is ignored).
If a client does not pass the Accept header, everything works exactly
like before, with all the downsides of the previous behaviour:
no protection from breaking changes; you get whatever HTML version
is currently in storage.
The deployed Parsoid version generates HTML versions 1.8.0  and
2.0.0 . But, it is worth mentioning that the oldest acceptable
version supported is 1.6.0, so if you’re sending an Accept header with
a version less than 1.6.0, your application will break. The reason
for this odd constraint is that we mistakenly released that version
without bumping the major version  even though it introduced a
breaking change. Mea culpa!
Also, RESTBase only stores the latest version so, as content gets
rerendered and storage gets replaced, clients requesting older content
have to pay a latency penalty while the stored content is downgraded
to an appropriate version. Hence, we encourage Parsoid HTML clients to
pay attention to announcements about major version changes and upgrade
promptly. Going forward, we’ll send announcements about Parsoid HTML
versions changes on the mediawiki-api-announce mailing list.
How does this impact 3rd party wikis?
Finally, astute readers will have noted that this announcement is
concerning the REST API. However, many 3rd party installs have VE
communicating directly with Parsoid and may be wondering how they’ll
be impacted by the change.
Parsoid has had a similar protocol (the difference is mainly in
respecting the patch level) implemented since the v0.9.0 release .
So, going forward, when upgrading Parsoid or VE, if the HTML version
requested by VE can be provided by Parsoid, the upgrade will be safe.
Content negotiation now allows us to deploy new Parsoid features to the
Wikimedia cluster without needing prior coordination with all clients.
Clients can continue to request older versions until they are ready to
update (assuming they don’t fall too far behind since we only plan on
supporting two major versions concurrently). And, conversely, they can
request newer versions with the guarantee that they will not receive