On January 31, 2018, on ru.wp, sv.wp, fi.wp and he.wp, we are going to
turn off Tidy and switch to the Remex HTML5 parsing library.
Besides those, another 200+ wikis will also be switched away from Tidy
on that day. You can find the list of such wikis at T184656 .
Do any of you belong to or know someone active in these communities?
While we've also announced this on Tech News, based on our previous
experience, since we don't anticipate the change to be ground-breaking
for these communities, we think that "spamming" the village pumps
may be not so effective and so we'd appreciate your help in assuring
these wikis that they can contact us @ mw:Help_talk:Extension:Linter 
if needed, and that there's plenty of documentation to help with
Linter fixes at mw:Help:Extension:Linter .
In July 2017, we announced  our intention to replace Tidy with a
HTML5-based solution on the Wikimedia cluster by the end of June 2018
at the latest. Please refer to that original posting for specifics of
the project and why we are replacing Tidy.
Status of Tidy replacement
Over the last 3 months, we have now replaced Tidy with RemexHTML
on mediawiki, testwiki, nowiki, fawiki, itwiki, dewiki and 170 other
small wikis. 
We have approached ruwiki, svwiki, fiwiki, hewiki for replacement this
month based on remaining linter errors and progress those wikis have
been making. We expect to approach other medium and large wikis for
replacement next month.
In addition, for any wiki that has < 10 linter errors in any high-priority
category, we will be replacing Tidy with RemexHTML. T184656  has the
list of wikis that will see this change (this list includes wikis that
have already had Tidy replaced in December).
To be clear, if we notice problems (or if the wiki requests it), we will
revert the change after identifying the source of the problem. If you
notice any incorrect rendering, you can use ?action=parsermigration-edit
to identify if the switch from Tidy actually caused it.
Status of linter fixes
We have been publishing weekly stats  of changes to linter counts
which shows how wikis have been progressing with making linter fixes.
Based on what we've observed, of the 38 largest wikis, besides the one
that have Tidy replaced already or will get it replaced this month, most
other wikis seem to be making progress, albeit at different rates.
idwiki, viwiki, jawiki, and rowiki haven't seen a lot of activity yet.
Results from pixel diff tests
We have also been doing weekly test runs to calculate pixel diffs on
about 70K pages which we have sampled from over 50 wikis. To do this,
we generate a screenshot of a page with Tidy and one with RemexHTML,
and compare the renderings while ignoring vertical whitespace shifts.
We generate a numeric score for the diff that tries to be reflective
of the magnitude of differences we are seeing.
Thanks to fixes to pages and our testing infrastructure to more
accurately detect differences, between July 2017 and January 2018,
the percentage of pages that rendered with only vertical whitespace
shifts increased from 91.9% to 94.6%. Similarly, the percentage of
pages that rendered with pixel perfect accuracy went up from
63.2% to 68.3%. For technical reasons related to the testing setup
that I will skip here, 100% for either metric is not achievable.
Overall, at the end of January, about 400 of Wikimedia's wikis will
have replaced Tidy. This includes 7 of the largest wikis.
Linter fixes are also happening on lots of wikis, but some large wikis
could pick up the pace.
We still expect to replace Tidy on all wikis by end of June 2018,
and your cooperation and help with fixing pages identified by the
Linter tool is greatly appreciated.
Manager and Technical Lead,
Parsing Team @ the WMF.
Hi! We have a video to share from the December 2017 Readers monthly
meeting. Dmitry Brant from the Wikimedia Foundation's Apps team shows
updates to the Wikipedia for Android app on Feed customization, the
Randomizer, and a Black theme for AMOLED devices.
*About the Randomizer*
The team redesigned the Randomizer function to be easier and more fun to
interact with. We hope it will give users hours of enjoyment and help them
discover Wikipedia content they might otherwise never have known existed.
Credit to design, engineering, and product management!
At the Dev Summit, Birgit Müller and I will run a session on Growing the
MediaWiki Technical Community. If you're attending, we hope you will
consider joining us.
Everyone (attending the Dev Summit or not) is welcome and encouraged to
participate at https://phabricator.wikimedia.org/T183318 (please comment
there, rather than by email).
We are discussing the following questions:
* What would allow you to develop and plan your software more efficiently?
* What would make software development more fun for you?
* What other Open Source communities do we share interests with?
* How can we change our processes to take technical debt more seriously?
"Develop" means any kind of work on a software system, including design,
Our topics are:
* Better processes and project management practices, integrating all
developers and allowing them to work more efficiently
* Building partnerships with other Open Source communities on shared
interests (e.g. translation, audio, video)
* Reducing technical debt
(And thank you for patience with me cross-posting if you're on other lists.)
I'm writing to invite your input on the following Phabricator task ahead of
next week's Wikimedia Developer Summit 2018  session.
Knowledge as a Service
The purpose  of the Wikimedia Developer Summit 2018 sessions is to
provide guidance for Phase 2 of the Movement Strategic Direction  on
buildout of technology capabilities. We'd really love your thoughts to help
set context for our session next week, as Knowledge as a Service is a
primary consideration in the Movement Strategic Direction.
What is Knowledge as a Service? Its essence is about information
architecture approaches and the necessary software that will ultimately
allow content consumption and creation to radiate to new and different
types of interfaces and devices in addition to browser-based approaches. As
you review position papers from attendees  you'll notice that the way
they (myself included) think about best solving this is through a heavy
emphasis on technology that makes it easier to better structure information
and its metadata for re-use, remixing, and querying.
What might this mean? Does it mean we should build Wikimedia software in an
API- and metadata-first manner following industry standards compatible with
content structuration? Does it mean weaving our existing structured and
semi-structured data technologies together? How do we build technology that
can ensure successful collaboration between communities on increasingly
structured and interdependent information sources? And how can we ensure
the tech will bolster growth of multilingual and multimedia content
creation and consumption?
I've copied some of the essential material from the Movement Strategic
Direction concerning Knowledge as a Service so you have it here. We would
appreciate your input and hope you will subscribe to the Phabricator task
to contribute and follow along as we explore this topic.
The following content is copied from
Knowledge as a service: To serve our users, we will become a platform that
serves open knowledge to the world across interfaces and communities. We
will build tools for allies and partners to organize and exchange free
knowledge beyond Wikimedia. Our infrastructure will enable us and others to
collect and use different forms of free, trusted knowledge.
As technology spreads through every aspect of our lives, Wikimedia's
infrastructure needs to be able to communicate easily with other connected
As a platform, we need to transform our structures to support new formats,
new interfaces, and new types of knowledge. We have a strategic opportunity
to go further and offer this platform as a service to other institutions,
beyond Wikimedia. In a world that is becoming more and more connected,
building the infrastructure for knowledge gives others a vested interest in
our success. It is how we ensure our place in the larger network of
knowledge, and become an essential part of it. As a service to users, we
need to build the platform for knowledge or, in jargon, provide knowledge
as a service.
Knowledge as a service: A platform that serves open knowledge to the world
across interfaces and communities
Our openness will ensure that our decisions are fair, that we are
accountable to one another, and that we act in the public interest. Our
systems will follow the evolution of technology. We will transform our
platform to work across digital formats, devices, and interfaces. The
distributed structure of our network will help us adapt to local contexts.
We will build tools for allies and partners to organize and exchange free
knowledge beyond Wikimedia.
We will continue to build the infrastructure for free knowledge for our
communities. We will go further by offering it as a service to others in
the network of knowledge. We will continue to build the partnerships that
enable us to develop knowledge we can't create ourselves.
Our infrastructure will enable us and others to collect and use different
forms of free, trusted knowledge.
We will build the technical infrastructures that enable us to collect free
knowledge in all forms and languages. We will use our position as a leader
in the ecosystem of knowledge to advance our ideals of freedom and
fairness. We will build the technical structures and the social agreements
that enable us to trust the new knowledge we compile. We will focus on
highly structured information to facilitate its exchange and reuse in