Hello:
Please take a look at the new browser report with more detailed desktop
site data (all wikimedia projects agreggated):
https://analytics.wikimedia.org/dashboards/browsers/#desktop-site-by-browser
Some highlights:
* Data is very stable over the last year
* Chrome in the lead with 45% of traffic, closely followed by IE (18%) and
FF (13%)
* The bulk of IE traffic is IE11 and IE7
* Edge shows up with 4% slowly catching up to Safari (5%)
* This data is still subject to fluctuations due to bot traffic not
identified as such. We will be working on this next year.
Thanks,
Nuria
There will be a meeting of the mediawiki farmers group at wikimania in
ballroom west at 6:30 on saturday. This is a user group which aims to
improve the experiance for using mediawiki in wikifarms. Anyone interested
in developing new wikifarm orientied features, uses mediawiki in a farm
setting or is just curious is encouraged to come.
Thanks,
Brian
How to read this post?
----------------------
* For those without time to read lengthy technical emails,
read the TL;DR section.
* For those who don't care about all the details but want to
help with this project, you can read sections 1 and 2 about Tidy,
and then skip to section 7.
* For those who like all their details, read the post in its entirety,
and follow the links.
Please ask follow up questions on wiki *on the FAQ’s talk page* [0]. If you
find a bug, please report it *on Phabricator or on the page mentioned
above*.
TL;DR
-----
The Parsing team wants to replace Tidy with a RemexHTML-based solution
on the
Wikimedia cluster by June 2018. This will require editors to fix pages and
templates to address wikitext patterns that behave differently with
RemexHTML. Please see 'What editors will need to do' section on the Tidy
replacement FAQ [1].
1. What is Tidy?
----------------
Tidy [2] is a library currently used by MediaWiki to fix some HTML errors
found in wiki pages.
Badly formed markup is common on wiki pages when editors use HTML tags in
templates and on the page itself. (Ex: unclosed HTML tags, such as a <small>
without a </small>, are common). In some cases, MediaWiki can generate
erroneous HTML by itself. If we didn't fix these before sending it to
browsers, some would display things in a broken way to readers.
But Tidy also does other "cleanup" on its own that is not required for
correctness. Ex: it removes empty elements and adds whitespace between HTML
tags, which can sometimes change rendering.
2. Why replace it?
------------------
Since Tidy is based on HTML4 semantics and the Web has moved to HTML5, it
also makes some incorrect changes to HTML to 'fix' things that used to not
work; for example, Tidy will unexpectedly move a bullet list out of a table
caption even though that's allowed. HTML4 Tidy is no longer maintained or
packaged. There have also been a number of bug reports filed against Tidy
[3]. Since Parsoid is based on HTML5 semantics, there are differences in
rendering between Parsoid's rendering of a page and current read view that
is based on Tidy.
3. Project status
-----------------
Given all these considerations, the Parsing team started work to replace
Tidy
[4] around mid-2015. Tim Starling started this work and after a survey of
existing options, decided to write a wrapper over a Java-based HTML5 parser.
At the time we started the project, we thought we could probably have Tidy
replaced by mid-2016. Alas!
4. What is replacing Tidy?
--------------------------
Tidy will be replaced by a RemexHTML-based solution that uses the
RemexHTML[5] library along with some Tidy-compatibility shims to ensure
better parity with the current rendering. RemexHTML is a PHP library
that Tim
wrote with C.Scott’s input that implements the HTML5 parsing spec.
5. Testing and followup
-----------------------
We knew that some pages will be affected and need fixing due to this change.
In order to more precisely identify what that would be, we wanted to do some
thorough testing. So, we built some new tools [6][7] and overhauled and
upgraded other test infrastructure [8][9] to let us evaluate the impacts of
replacing Tidy (among other such things in the future) which can be a
subject
of a post all on its own.
You can find the details of our testing on the wiki [1][10], but we found
that a large number of pages had rendering differences. We analyzed the
results and categorized the source of differences. Based on that, to
ease the
process of replacement, we added a bunch of compatibility shims to mimic
what
Tidy does. I am skipping the details in this post. Even after that, newer
testing showed that this nevertheless still leaves us with a few patterns
that need fixing that we cannot / don't want to work around automatically.
6. Tools to assist editors: Linter & ParserMigration
----------------------------------------------------
In October 2016, at the parsing team offsite, Kunal ([[User:Legoktm (WMF)]])
dusted off the stalled wikitext linting project [11] and (with the help from
a bunch of people on the Parsoid, db/security/code review areas) built the
Linter extension that surfaces wikitext errors that Parsoid knows about to
let editors fix them.
Earlier this year, we decided to use Linter in service of Tidy replacement.
Based on our earlier testing results, we have added a set of high-priority
linter categories that identifies specific wikitext markup patterns on wiki
pages that need to be fixed [12].
Separately, Tim built the ParserMigration extension to let editors evaluate
their fixes to pages [13]. You can enable this in your editing
preferences or
replace '&action=edit' in your url bar with '&action=parsermigration-edit'.
7. What editors have to do
--------------------------
The part that you have all been waiting for!
Please see 'What editors will need to do' section on the Tidy
replacement FAQ
[1]. We have added simplified instructions, so that even community members
who do not consider themselves "techies" can still learn about ways to fix
pages. We'll keep that section up to date based on feedback and questions.
But since it is a wiki, please also edit and tweak as required to make the
text useful for yourselves! This is a first call for fixes and it is about
the problems defined as "high priority". We'll issue other calls in the
future for any other necessary Tidy fixups.
Caveats:
* As noted on that page, the linter categories don't cover all the possible
sources of rendering differences. For example, there is still T157418
[14]
left to address. For those who have an opinion about this, please
chime in
on that task. We are still evaluating the best solution for this without
adding more cruft to wikitext behavior or kicking the cleanup can down
the road.
* As the issues in the identified linter categories are fixed, we might be
better able to isolate other issues that need addressing.
8. So, when will Tidy actually be replaced?
-------------------------------------------
We really would like to get Tidy removed from the cluster latest by June
2018
(or sooner if possible), and your assistance and prompt attention to these
markup issues would be very helpful. We will do this in a phased manner on
different wikis rather than all at once on all wikis.
We really want to do this as smoothly as possible without disrupting the
work
of editors or affecting the rendering of the large corpus of pages on the
various wikis. As you might have gathered from the text above, we have built
and leveraged a wide variety of tools to assist with this.
9. Monitoring progress
----------------------
In order to monitor progress, we plan to do a weekly (or some such periodic
frequency) test run that compares the rendering of pages with Tidy and with
RemexHTML on a large sample of pages (in the 50K range) from a large subset
of Wikimedia wikis (~50 or so). This will give us a pulse of how fixups are
going, and when we might be able to flip the switch on different wikis.
Subramanya (Subbu) Sastry
Parsing Team.
References
----------
0. https://www.mediawiki.org/wiki/Talk:Parsing/Replacing_Tidy/FAQ
1.
https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy/FAQ#What_will_editors…
2. https://en.wikipedia.org/wiki/HTML_Tidy
3. https://phabricator.wikimedia.org/tag/tidy/
4. https://phabricator.wikimedia.org/T89331
5. https://github.com/wikimedia/mediawiki-libs-RemexHtml
6. https://phabricator.wikimedia.org/T120345
7. https://github.com/wikimedia/integration-uprightdiff
8. https://github.com/wikimedia/integration-visualdiff
9. https://github.com/wikimedia/mediawiki-services-parsoid-testreduce
10. https://www.mediawiki.org/wiki/Parsing/Replacing_Tidy
11. https://phabricator.wikimedia.org/T48705
12.
https://www.mediawiki.org/wiki/Help:Extension:Linter#Goal:_Replacing_Tidy
13.
https://www.mediawiki.org/wiki/Help:Extension:Linter#Verifying_fixes_for_th…
14. https://phabricator.wikimedia.org/T157418
>
> 2b. I like the chart on page page 24 that attempts to show organizational
> relationships for the Structured Data project. Who created that chart?
> Could Wikimedia affiliate organizations be added as a box on that chart,
> and then could the chart's creator upload it as a standalone file to
> Commons (if it's not already there)?
>
Hullo Pine, I made this chart. It's not on Commons yet but I can upload it
once this edit is done. The chart is a visual representation of groups
that are affected by decisions made about the Structured Data on Commons
program. In that context, where would you place Wikimedia Affiliates?
Some affiliates are program organizers and/or tool developers, some have
strong contingents of Commonists, all certainly contain Wikimedians. Is
there another way Wikimedia Affiliates are affected by Structured Data on
Commons that is not included in these groups?
Cheers,
Amanda
From: Pine W <wiki.pine(a)gmail.com>
> Date: Mon, Aug 7, 2017 at 7:20 PM
> Subject: Re: [Wikitech-l] Audiences Q4 FY1617 Quarterly Check-In slides &
> notes
> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
>
>
> Grace, thanks for publishing these. I have a few questions and comments.
>
> 1. I have no questions regarding the first slide deck, and I found that it
> was informative for me.
>
> 2. Regarding the second slide deck:
>
> 2a. I like the Causegraph visualization
>
> 2b. I like the chart on page page 24 that attempts to show organizational
> relationships for the Structured Data project. Who created that chart?
> Could Wikimedia affiliate organizations be added as a box on that chart,
> and then could the chart's creator upload it as a standalone file to
> Commons (if it's not already there)?
>
> 2c. It looks like
> https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_
> Plan/Quarterly_check-ins/Audiences2_Notes_July_2017
> is incomplete; could someone please add the remaining notes?
>
> 2d. I'd appreciate more information about on how much money (including
> money for WMF staff time) is being spent on on-the-ground awareness
> campaigns in India and Nigeria, and what the SMART goals are for these
> projects. Could someone provide that information?
>
> Thanks,
>
> Pine
>
>
> On Mon, Aug 7, 2017 at 2:17 PM, Grace Gellerman <ggellerman(a)wikimedia.org>
> wrote:
>
> > Hi, Wikitech-l
> >
> > Slide decks and notes from recent Audiences Quarterly Check-Ins are
> posted
> > as follows:
> >
> > Audiences 1 (Comm. Tech, Comm. Health, Contributors, Design, & Business
> > Ops):
> > slides
> > <https://commons.wikimedia.org/wiki/File:(PUBLIC_VERSION)
> > _Audiences_1_(Community_Tech,_Community_Health,_Contributors,_Audiences_
> > Design,_Audiences_Operations)_Quarterly_Check-in,_July_2017.pdf>
> > & notes
> > <https://meta.wikimedia.org/wiki/Wikimedia_Foundation_
> > Annual_Plan/Quarterly_check-ins/Audiences1_Notes_July_2017>
> >
> >
> > Audiences 2 (New Readers, Readers, Structured Data, Wikidata)
> > slides
> > <https://commons.wikimedia.org/wiki/File:Audiences_2_(
> > New_Readers,_Structured_Data_on_Commons,_Wikidata,_%26_
> > Readers)_Quarterly_Check-in,_July_2017.pdf
> > <https://commons.wikimedia.org/wiki/File:Audiences_2_%28New_
> Readers,_Structured_Data_on_Commons,_Wikidata,_%26_
> Readers%29_Quarterly_Check-in,_July_2017.pdf>
> > >
> > & notes
> > <https://meta.wikimedia.org/wiki/Wikimedia_Foundation_
> > Annual_Plan/Quarterly_check-ins/Audiences2_Notes_July_2017>
> > .
> >
> > These and other links are also posted on the quarterly checkin page
> > <https://meta.wikimedia.org/w/index.php?title=Wikimedia_
> > Foundation_Annual_Plan/Quarterly_check-ins&wteswitched=1#
> Fourth_quarter_.
> > 28April.E2.80.93June_2017.29>
> > .
> >
> > Thanks,
> > Grace
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> *Anne Gomez* // Senior Program Manager, New Readers
> <https://meta.wikimedia.org/wiki/New_Readers>
> https://wikimediafoundation.org/
>
>
> *Imagine a world in which every single human being can freely share in the
> sum of all knowledge. That's our commitment. Donate
> <http://donate.wikimedia.org>. *
>
>
>
https://www.mediawiki.org/wiki/Scrum_of_scrums/2017-08-09
= *2017-08-09 =*
== Callouts ==
* Ops: Readers eng need a user account for uploading zim collections to
beta/prod Swift (https://phabricator.wikimedia.org/T172735 )
== Audiences ==
=== Readers ===
==== iOS native app ====
* Blocked by: none
* Blocking: none
* Updates: Releasing 5.6.0 ( Reading themes, on this day
https://phabricator.wikimedia.org/project/view/2701/ ), Starting work on
5.6.1 ( minor bug fixes, any crash fixes
https://phabricator.wikimedia.org/project/view/2898/ )
==== Android native app ====
* *Blocked by:* Ops: need a user account for uploading zim collections to
beta/prod Swift (https://phabricator.wikimedia.org/T172735 )
* *Blocking:* n/a
* Updates:
** Cookie beta release occurred!
https://phabricator.wikimedia.org/project/view/2763/ Hopefully this
release will resolve the issues people are reporting with offline saved
pages.
** Offline compilations client-side work is nearly complete, only minor
onboarding/branding updates needed –
https://phabricator.wikimedia.org/project/view/2833/
*** Work on offline compilation file generation and storage is underway (
https://phabricator.wikimedia.org/T170843 )
** Two new app engineers starting Monday 8/21!
==== Reading Infrastructure ====
* Blocked by:
* Blocking:
* Updates:
** Gergo on vacation until Wikimania
** Bernd on vacation until next week
==== Multimedia ====
* Hiring: Progressing well, should have full capacity relatively soon
* Next week offsite (hopefully with entire team)
* 3D work is nearly done from the technical side, pending design and legal
review we'll start pushing out soon.
* Wikilegal posting for 3D files is out -
https://meta.wikimedia.org/wiki/Wikilegal/3D_files_and_3D_printing
* MP3 discussion on Commons also coming up soon - draft at
https://commons.wikimedia.org/wiki/User:CKoerner_(WMF)/MP3_patrol_discussion
==== Web ====
Same as usual:
* Preparation work for new page summary API
* MCS is no longer using MobileView API (but still needs to be deployed)
* Desktop print styles
* Deploying Page previews to all projects but enwiki and dewiki (likely to
be stalled until post Wikimania)
==== Discovery ====
* working on AB test to add thumbnails to search results
https://phabricator.wikimedia.org/T149811
=== Community Tech ===
* Deployed CodeMirror, then undeployed. Problems resolved, will try again
soon.
* LoginNotify coming next week.
* Work continues on ArticleCreationWorkflow and GlobalPreferences
=== Search Platform ===
* Blocked by: none
* Blocking: none
* Updates:
** Continuing work on ML-assisted ranking
** Enabled A/B test with interleaving search results (
https://phabricator.wikimedia.org/T150032 )
** Running human graded relevancy test (
https://phabricator.wikimedia.org/T171740 )
** Vietnamese analyzer plugin test round 2 finished, getting better but
still not ready (https://phabricator.wikimedia.org/T170423#3502328 )
** WDQS now has per-client request throttling
=== Contributors ===
==== Global Collaboration ====
===== Collaboration =====
* Blocking - WIP on Flow fix needed by dumps team
* Updates
* RCFilters - Various updates, especially regarding Live Update.
** Bring back old vs new marker in live update
** RCFilters: Show "from" link when live update is not available
** Allow non-sticky filters to be excluded from saved queries
** Correct label for "View newest changes" button
** RCFilters: Remove new changes visual cue for Live Update feature
** Fix overzealous addQuotes for rc_source field in rebuildrecentchanges.php
** RCFilters: Prevent live update fetch if model is in conflict
** RCFilters: Adjust styling of 'other review tools' button
** RCFilters: Normalize 'limit' to minimum 0, like the backend does
** RCFilters: Adjust server default variable names for limit/days
** RCFilters: Normalize user-generated default values
** RCFilters: set live update button title
** RCFilters: Add vendor prefixes to loading animation
** RCFilters: Add 'enhanced' view (Group by pages)
** RCFilters: Normalize arbitrary values before adding them
** RCFilters: Don't reload results for redundant requests
** RCFilters: Add 'advanced filters' label to the view selection
** RCFilters: Unsticky the 'limit' preference temporarily
** RCFilters: Trim results to allow searching for spaces after trigger
** RCFilters: Scroll widget to top when switching view
** RCFilters: Pluralize 'show last X changes' message
== Technology ==
=== Services ===
* Blockers: none
* Updates:
** Alpha version of *recommendation API* deployed
https://en.wikipedia.org/api/rest_v1/#!/Recommendation/get_data_recommendat…
*** Wikidata query service is contacted in codfw and some testing Discovery
is doing makes it timeout
** Job event production rolled out on group0, group1 today
** Use cases for delayed jobs. Please help to discover them
*** https://phabricator.wikimedia.org/T172832
=== Analytics ===
* snapshots 2017-07 of the mediawiki_history table is ready, and includes
all wikis. This is the first time we have all wikis in this fully publicly
shareable denormalized history of all edit metadata.
* Ongoing work to rebuild kafka cluster, just upgraded varnishkafka to be
able to use TLS
* Found a small issue with the way kafka2sse (EventStreams) uses
librdkafka, working on deploying now to scb. Until then, we may be losing
one or two events there due to an interaction between varnishkafka and
librdkafka
* Still one person short due to paternity leave
* Added continous integration to wikistats 2.0 UI, still evaluating usage
of diffusion/differential over gerrit
* Eventlogging purging ongoing, really slow on one of the slaves, we need
to free disk space faster to not run into disk issues so older eventlogging
schemas with loads of data not frequently used will be archived in hadoop.
For example: https://phabricator.wikimedia.org/T170720
* Waiting for reading Data analyst to vet family wide (*.wikipedia.org)
unique devices metric.
* Upgraded Druid storage to 0.9.2 to be able to test metric calculation for
edit metrics using data from Edit Data Lake, performance of groupBy is much
better
Due to some continued overuse of the Wikidata Query Service, and the SPARQL
endpoint, we recently implemented a throttling feature to prevent users and
bots from using too many resources on the servers.
Here are the new limits:
* any user that is identified by IP and User Agent, can use the service for
60 seconds of query time per minute (burst at 120 seconds per minute)
* any user query can generate up to 30 errors per minute (burst to 60
errors per minute)
Please let us know if there are questions or concerns with the new usage
limits, as we are able to fine tune them if it is causing problems with
reasonable use cases.
Thanks for your understanding!
--
deb tankersley
irc: debt
Product Manager, Discovery
Wikimedia Foundation
Hi Community Metrics team,
This is your automatic monthly Phabricator statistics mail.
Accounts created in (2017-07): 432
Active Maniphest users (any activity) in (2017-07): 888
Task authors in (2017-07): 499
Users who have closed tasks in (2017-07): 280
Projects which had at least one task moved from one column to another on
their workboard in (2017-07): 293
Tasks created in (2017-07): 2776
Tasks closed in (2017-07): 2267
Open and stalled tasks in total: 35271
Median age in days of open tasks by priority:
Unbreak now: 18
Needs Triage: 297
High: 536
Normal: 727
Low: 973
Lowest: 954
(How long tasks have been open, not how long they have had that priority)
Active Differential users (any activity) in (2017-07): 24
TODO: Numbers which refer to closed tasks might not be correct, as
described in https://phabricator.wikimedia.org/T1003 .
Yours sincerely,
Fab Rick Aytor
(via community_metrics.sh on phab1001 at Tue Aug 8 03:54:29 UTC 2017)
Hi,
I've proposed a roundtable like discussion for the Hackathon on Thursday
to discuss what makes high quality extensions so nice. The goal is to
collect a list of things that developers like and look for when
reviewing extensions, as advice for newer developers.
Even if you aren't able to attend in person, you can leave suggestions
in the comments at <https://phabricator.wikimedia.org/T172845>.
See you then!
-- Legoktm
Hi all!
Due to Wikimania coming up, the TechCom Radar email is coming very late this
time. Sorry about that!
Here are the minutes from this week's TechCom meeting. You can also find the
minutes at
<https://www.mediawiki.org/wiki/Wikimedia_Technical_Committee/2017-08-02>.
See also the TechCom activity page on the RFC board
<https://phabricator.wikimedia.org/tag/mediawiki-rfcs/>.
Here are the minutes, for your convenience:
* Discussion of talk pages and contributions for IPv6 ranges happened on August
2nd, but the RFC’s author wasn’t present. The log is linked on the ticket.
<https://phabricator.wikimedia.org/T171382#3496647 >
* New RFC by MaxSem: Require PHP 5.6 for MediaWiki 1.31 (target summer of 2018):
<https://phabricator.wikimedia.org/T172165>
* TechCom will host a session about the new charter during the Wikimania
Hackathon. <https://www.mediawiki.org/wiki/Talk:Architecture_committee/Charter>
* The next TechCom meeting on August 9th will be a public in-person meeting at
the Wikimania hackathon, at 4pm local time (that’s 1pm PDT and 20:00 UTC). This
is a TechCom planning meeting, not an RFC discussion. Remote participation will
be possible (probably via Google Hangout). Watch for aqn announcement in the
Hackathon unconference programme.
<https://wikimania2017.wikimedia.org/wiki/Hackathon/Program>
* There will be no RFC discussion on August 16th, since most TechCom members
will be busy with post-Wikimania activities or travel. TechCom Will resume its
regular schedule on August 23rd, the next RFC discussion will probably be on
August 30th.
--
Daniel Kinzler
Principal Platform Engineer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.