On 07/06/2017 09:59 AM, Pine W wrote:
Thanks for the information.
I understand that moving from HTML 4 to HTML 5 is probably a good idea.
It is a good and necessary step. We want MediaWiki (and wikipedia)
output to keep up with web standards.
However, I am concerned about this statement:
"This will require editors to
fix pages and templates to address wikitext patterns that behave differently
with RemexHTML".
Understandably. Let me try to address your questions below.
But, before that, I want to point you back to my email where I mentioned
that before we arrived at the current proposal, we did a bunch of work
to minimize work for editors.
(1) we added Tidy compatibility shims where we could automatically
preserve Tidy behavior (however much we might have liked to pull those
bandaids off)
(2) we built a bunch of tools and infrastructure to precisely identify
what pages and what specific pieces of markup on those pages need fixing
(3) build a tool to let editors compare their fixes before/after so they
can be sure that they are doing the right fixes and the fixes do the
right thing.
I am re-emphasizing this to indicate that we arrived at this proposal
after sufficient prior work to respect editors' time and efforts and to
support them in what we are requesting them to do. We also re-aligned
timelines to reflect the complexity of the task that we uncovered.
As you probably know, the supply of content
contributors' time is far too
low to meet the demands of keeping up with everything that ideally would be done
on the content projects.
Do note that we started this process last year as somewhat of a trial
balloon and we found that editors on wikis were very willing and helpful
with the process. See
https://phabricator.wikimedia.org/T134423. More
recently, earlier this year, we made some fixes to the preprocessor to
fix some edge cases in language variant handling. Once again, this
required fixes to markup on pages and editors on a bunch of wikis were
more than willing and quite helpful in making these changes. See
https://www.mediawiki.org/wiki/Parsoid/Language_conversion/Preprocessor_fix….
We are very happy with this collaboration and hope we can continue with
this.
So, while I understand your concern, I am optimistic that we can work on
this collaboratively and make the fixes necessary to address technical
debt that has accumulated over the years in our wikis (and hence the
MediaWiki codebase) while enabling the upgrade of our output to modern
web standards.
I am thinking that instead of asking content
contributors to spend lots of
hours (do we know how many? Hundreds? Thousands?) fixing all of these issues,
it would make more sense to develop bots to address them.
I cannot quantify this number (about number of hours). But, with some
effort, we could perhaps come with rough back of the envelope numbers.
Here are a few questions:
1. How many fixes do you think will be needed, for the highest priority
fixes as well as all fixes?
But, on the large wikipedias, thousands of fixup instances (not pages)
are present ([1], [2], [3]) in the high priority categories (which are
the only ones required for replacing Tidy). In reality, fixing a few
templates will bring down these numbers greatly. For example, for one of
the linter categories (workaround for a paragraph wrapping bug), fixing
the nowrap / nowrap-begin template will likely address most problematic
instances found on specific wikis.
[1]
https://en.wikipedia.org/wiki/Special:LintErrors
[2]
https://fr.wikipedia.org/wiki/Special:LintErrors
[3]
https://es.wikipedia.org/wiki/Special:LintErrors
2. How many hours of volunteer time do you think that
these fixes will
require, for the highest priority fixes as well as all fixes?
I do not know offhand, but I expect each (non-template) fix to take no
more than a couple of minutes. So, let us go with that and do some rough
back of the envelope numbers. Assuming we have say 120,000 fixes total
required across all wikis, 50% of them coming from non-templates, we
have 60,000*2 minutes = 2000 person hours. Templates are going to take
more time but they are likely to far fewer of them. Say, 1000 * 15
minutes = 250 hours?
Take that 2-minute calculation for what it is worth, but I think the
collective effort we are asking is not entirely unreasonable over a
period of many months.
3. How feasible would it be to build bots to make 90%
of high priority
fixes and 90% of all fixes?
Since the start of the Linter project (when we started off with the GSoC
prototype in summer of 2014, and once again when Kunal picked it up in
2016), we have been in conversation with Nico V (frwiki and who
maintains WPCleaner) and with Marios Magioladitis and Bryan White
(Checkwiki) to integrate the output with their projects / tools. On
Nico's request, we have added API endpoints to Linter, Parsoid, and
RESTBase so that the tool can programmatically fetch linter issues, and
let editors / bots fix them appropriately.
But, I cannot answer right now how feasible it is to build bots to do
what you are asking about. I welcome other insights and perspectives
from others.
I'm not trying to obstruct technical progress, but
I am generally not a fan
of WMF adding to volunteers' workloads. If the number of changes involved are
small and the number of hours to make them is small, that is less of a
concern than if we are talking about thousands of changes and hundreds or thousands
of volunteer hours.
I hope my response here allays some, if not all, your concerns.
Subbu.