Crossposting!
---------- Forwarded message ----------
From: Lani Goto <lgoto(a)wikimedia.org>
Date: Wed, Jul 26, 2017 at 11:18 AM
Subject: CREDIT showcase, Wednesday 2-August-2017
To: wikitech-l(a)lists.wikimedia.org
Hi!
I'd like to welcome you to join us at the CREDIT showcase next week,
Wednesday, 2-August-2017 at 1800 UTC / 1100 Pacific Time. We'd like to see
your demos, whether they're rough works in progress or polished production
material, or even just a telling of something you've been studying
recently. For more information on the upcoming event, as well as recordings
of previous events, please visit the following page:
<http://goog_1968694156/>
https://www.mediawiki.org/wiki/CREDIT_showcase
And if you'd like to share the news about the upcoming CREDIT showcase,
here's some suggested verbiage. Thanks!
*Hi <FNAME>*
*I hope all is well with you! I wanted to let you know about CREDIT, a
monthly demo series that we’re running to showcaseopen source tech projects
from Wikimedia’s Community, Reading, Editing, Discovery, Infrastructure and
Technology teams. *
*CREDIT is open to the public, and we welcome questions and discussion. The
next CREDIT will be held on August 2nd at 11am PT / 2pm ET / 18:00 UTC. *
*There’s more info on MediaWiki, and on Etherpad, which is where we take
notes and ask questions. You can also ask questions on IRC in the Freenode
chatroom #wikimedia-office (web-based access here). Links to video will
become available at these locations shortly before the event.*
*Please feel free to pass this information along to any interested folks.
Our projects tend to focus on areas that might be of interest to folks
working across the open source tech community: language detection,
numerical sort, large data visualizations, maps, and all sorts of other
things.*
*If you have any questions, please let me know! Thanks, and I hope to see
you at CREDIT.*
*YOURNAME*
--
Lani Goto
Project Assistant, Engineering Admin
--
Lani Goto
Project Assistant, Engineering Admin
Hello all,
I failed to send this notice out before when the change was made; mea
culpa.
Background:
* We start a new 1.XX-wmf.XX series after each MW 'tarball' release. For
example right now we're in the 1.30-wmf.XX series now that 1.29 is
nearing release.
The change:
* Instead of only incrementing the wmf.XX portion when a new branch is
actually deployed to Wikimedia production servers, we will increment
that number each week regardless.
** For example, last week we did not push a new branch out to production
due to the short work week. That week would have been 1.30-wmf.8. We
thus skipped wmf.8 and are now on wmf.9 this week.
Why?
We hope to make the creation of the weekly deployment branch
(1.XX-wmf.XX) automatic in the near future. This will allow us to put
that on a cron and not worry about special cases (another special case
being when we hold back the train due to a bad regression). This (every
week gets a wmf.XX number) should simplify logic in many places.
Thanks!
Greg on behalf of the Release Engineering team
PS: Yeah, we could have just gone with ISO week numbers, but we didn't
want to change too much in the version string to reduce the chance of
breaking too many other tools.
PPS: And yes, my failure to pre-announce this instead of post-announce
caused at least the ReleaseTaggerBot to break this week. Mea cupla.
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| Release Team Manager A18D 1138 8E47 FAC8 1C7D |
Hi all, apologies for the cross-posting!
How to read this post?
----------------------
* For those without time to read lengthy technical emails,
read the TL;DR section.
* For those who don't care about all the details but want to
help with this project, you can read sections 1 and 2 about Tidy,
and then skip to section 7.
* For those who like all their details, read the post in its entirety,
and follow the links.
Please ask follow up questions on wiki *on the FAQ’s talk page* [0]. If you
find a bug, please report it *on Phabricator or on the page mentioned
above*.
TL;DR
-----
The Parsing team wants to replace Tidy with a RemexHTML-based solution on
the Wikimedia cluster by June 2018. This will require editors to fix pages
and templates to address wikitext patterns that behave differently with
RemexHTML. Please see 'What editors will need to do' section on the Tidy
replacement FAQ [1].
1. What is Tidy?
----------------
Tidy [2] is a library currently used by MediaWiki to fix some HTML errors
found in wiki pages.
Badly formed markup is common on wiki pages when editors use HTML tags in
templates and on the page itself. (Ex: unclosed HTML tags, such as a <small>
without a </small>, are common). In some cases, MediaWiki can generate
erroneous HTML by itself. If we didn't fix these before sending it to
browsers, some would display things in a broken way to readers.
But Tidy also does other "cleanup" on its own that is not required for
correctness. Ex: it removes empty elements and adds whitespace between HTML
tags, which can sometimes change rendering.
2. Why replace it?
------------------
Since Tidy is based on HTML4 semantics and the Web has moved to HTML5, it
also makes some incorrect changes to HTML to 'fix' things that used to not
work; for example, Tidy will unexpectedly move a bullet list out of a table
caption even though that's allowed. HTML4 Tidy is no longer maintained or
packaged. There have also been a number of bug reports filed against Tidy
[3]. Since Parsoid is based on HTML5 semantics, there are differences in
rendering between Parsoid's rendering of a page and current read view that
is based on Tidy.
3. Project status
-----------------
Given all these considerations, the Parsing team started work to replace
Tidy [4] around mid-2015. Tim Starling started this work and after a survey
of existing options, decided to write a wrapper over a Java-based HTML5
parser. At the time we started the project, we thought we could probably
have Tidy replaced by mid-2016. Alas!
4. What is replacing Tidy?
--------------------------
Tidy will be replaced by a RemexHTML-based solution that uses the
RemexHTML[5] library along with some Tidy-compatibility shims to ensure
better parity with the current rendering. RemexHTML is a PHP library that
Tim wrote with C.Scott’s input that implements the HTML5 parsing spec.
5. Testing and followup
-----------------------
We knew that some pages will be affected and need fixing due to this
change. In order to more precisely identify what that would be, we wanted
to do some thorough testing. So, we built some new tools [6][7] and
overhauled and upgraded other test infrastructure [8][9] to let us evaluate
the impacts of replacing Tidy (among other such things in the future) which
can be a subject of a post all on its own.
You can find the details of our testing on the wiki [1][10], but we found
that a large number of pages had rendering differences. We analyzed the
results and categorized the source of differences. Based on that, to ease
the process of replacement, we added a bunch of compatibility shims to
mimic what Tidy does. I am skipping the details in this post. Even after
that, newer testing showed that this nevertheless still leaves us with a
few patterns that need fixing that we cannot / don't want to work around
automatically.
6. Tools to assist editors: Linter & ParserMigration
----------------------------------------------------
In October 2016, at the parsing team offsite, Kunal ([[User:Legoktm (WMF)]])
dusted off the stalled wikitext linting project [11] and (with the help from
a bunch of people on the Parsoid, db/security/code review areas) built the
Linter extension that surfaces wikitext errors that Parsoid knows about to
let editors fix them.
Earlier this year, we decided to use Linter in service of Tidy replacement.
Based on our earlier testing results, we have added a set of high-priority
linter categories that identifies specific wikitext markup patterns on wiki
pages that need to be fixed [12].
Separately, Tim built the ParserMigration extension to let editors evaluate
their fixes to pages [13]. You can enable this in your editing preferences
or replace '&action=edit' in your url bar with
'&action=parsermigration-edit'.
7. What editors have to do
--------------------------
The part that you have all been waiting for!
Please see 'What editors will need to do' section on the Tidy replacement
FAQ [1]. We have added simplified instructions, so that even community
members who do not consider themselves "techies" can still learn about ways
to fix pages. We'll keep that section up to date based on feedback and
questions. But since it is a wiki, please also edit and tweak as required
to make the text useful for yourselves! This is a first call for fixes and
it is about the problems defined as "high priority". We'll issue other
calls in the future for any other necessary Tidy fixups.
Caveats:
* As noted on that page, the linter categories don't cover all the possible
sources of rendering differences. For example, there is still T157418 [14]
left to address. For those who have an opinion about this, please chime in
on that task. We are still evaluating the best solution for this without
adding more cruft to wikitext behavior or kicking the cleanup can down the
road.
* As the issues in the identified linter categories are fixed, we might be
better able to isolate other issues that need addressing.
8. So, when will Tidy actually be replaced?
-------------------------------------------
We really would like to get Tidy removed from the cluster latest by June
2018 (or sooner if possible), and your assistance and prompt attention to
these markup issues would be very helpful. We will do this in a phased
manner on different wikis rather than all at once on all wikis.
We really want to do this as smoothly as possible without disrupting the
work of editors or affecting the rendering of the large corpus of pages on
the various wikis. As you might have gathered from the text above, we have
built and leveraged a wide variety of tools to assist with this.
9. Monitoring progress
----------------------
In order to monitor progress, we plan to do a weekly (or some such periodic
frequency) test run that compares the rendering of pages with Tidy and with
RemexHTML on a large sample of pages (in the 50K range) from a large subset
of Wikimedia wikis (~50 or so). This will give us a pulse of how fixups are
going, and when we might be able to flip the switch on different wikis.
Best,
Elitre (WMF) on behalf of the Parsing Team (
https://www.mediawiki.org/wiki/Parsing )
References
----------
0. https://www.mediawiki.org/wiki/Talk:Parsing/Replacing_Tidy/FAQ
1.
https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy/FAQ#
What_will_editors_need_to_do.3F
2. https://en.wikipedia.org/wiki/HTML_Tidy
3. https://phabricator.wikimedia.org/tag/tidy/
4. https://phabricator.wikimedia.org/T89331
5. https://github.com/wikimedia/mediawiki-libs-RemexHtml
6. https://phabricator.wikimedia.org/T120345
7. https://github.com/wikimedia/integration-uprightdiff
8. https://github.com/wikimedia/integration-visualdiff
9. https://github.com/wikimedia/mediawiki-services-parsoid-testreduce
10. https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy
11. https://phabricator.wikimedia.org/T48705
12. https://www.mediawiki.org/wiki/Help:Extension:Linter#Goal:_
Replacing_Tidy
13. https://www.mediawiki.org/wiki/Help:Extension:Linter#Verifyi
ng_fixes_for_these_lint_categories
14. https://phabricator.wikimedia.org/T157418
A translatable message about this project is here:
https://meta.wikimedia.org/wiki/Editing/Accessible_editing_buttons
Details about why, when, and how to find out whether your favorite script
will be affected (and how to make the easy fixes!) are here:
https://www.mediawiki.org/wiki/Contributors/Projects/Accessible_editing_but…
This change has already been made at the Persian and Polish Wikipedias.
This Wednesday (less than 48 hours from now), it will reach the *French,
Spanish, Japanese, Russian, and Italian Wikipedias and Meta*. In the
coming weeks, it will arrive at all of the other Wikipedias (including
English and German), the non-Wikipedias, and Commons.
Please share this information with your communities.
--
Sherry Snyder (WhatamIdoing)
Community Liaison, Wikimedia Foundation
Hey everyone,
apologies for the cross-posting, we're just too excited:
we're looking for a new member for our team [0], who'll dive right away in
the promising Structured Data project. [1]
Is our future colleague hiding among the tech ambassadors, translators,
GLAM people, community members we usually work with? We look forward to
finding out soon.
So please, check the full job description [2], apply, or tell/recommend
anyone who you think may be a good fit. For any questions, please contact
me personally (not here).
Thanks!
Elitre (WMF)
Senior Community Liaison, Technical Collaboration
[0] https://meta.wikimedia.org/wiki/Community_Liaisons
[1] https://commons.wikimedia.org/wiki/Commons:Structured_data
[2]
https://boards.greenhouse.io/wikimedia/jobs/610643?gh_src=o3gjf21#.WMGV0Rih…