Hi all!
tl;dr: Gerrit HTTP token auth has been re-enabled. To use it you'll need to
generate a token via your preferences page[0].
Gerrit HTTP token auth was disabled in mid-march due to concerns about its
implementation[1].
Thanks to the work of Paladox and Gerrit upstream in Gerrit 2.15.14[2] we've
re-enabled HTTP token authentication.
I previously removed all HTTP auth tokens, so in order to use HTTP token auth
you'll need to generate a fresh token via your preference page[0]
Your Lowly Gerrit Fiddler,
-- Tyler
[0]. <https://gerrit.wikimedia.org/r/#/settings/http-password>
[1]. <https://phabricator.wikimedia.org/T218750>
[2]. <https://www.gerritcodereview.com/2.15.html#21514>
Hi all!
I invite you to try out "Project Ruprecht"[1][2], a tool that measures the
"tangledness" of PHP code, and provides you with a "naughty list" of things to fix.
For now, you will have to install this locally. I hope however to soon have this
run automatically against core on a regular basis, perhaps by integrating it
with SonarQube. Maybe some day we can also integrate it with CI, to generate a
warning when a new cyclic dependency is about to be introduced.
So that was the tl;dr. Now for some context, history, and shout-outs. And some
actual real world science, too!
For a while now, I have been talking about the how much of a problem cyclic
dependencies are in MediaWiki core: When two components (classes, namespaces,
libraries, whatever) depend on each other, directly or indirectly, this means
that one cannot be used without the other, nor can it be tested, understood, or
modified without also considering the other. So, in effect, they behave as *one*
component, not two. Applied to MW core, this means that roughly half of our 1600
classes effectively behave like a single giant class. This makes the code rather
hard to deal with.
To fix this, I have been looking for tools that let me identify "tangles" of
classes that depend on each other, and metrics' to measure the progress of
"untangling" the code. However, the classic code quality metrics focus on
"local" properties of the code, so they can't tell us much about the progress of
untangling. And the tools I found that would detect cyclic dependencies in PHP
code would all choke on MediaWiki core: they would try to list all detected
cycles - which, by the super-exponential nature of possible paths through a
graph, would be millions and millions. So, the tools would choke and die. That
approach isn't practical for us.
Two discoveries allowed me to come up with a working solution:
First, I decided to leave the PHP world and turned towards graph analysis tools
built for large data sets. Python's graph-tool did the trick. It's build on top
of boost and numpy, and it's *fast*. It crunched through the 7500 or so class
dependencies in MW core in a split second, and told me that we have 14 "tangles"
(non-trivial strongly connected components), and that 43% of our classes are in
these tangles, with 40% being part of one big tangle that is essentially our
monolith manifest. So now I had a metric to work with: the number of classes in
tangles.
That was great, but still didn't tell me where to start. Graph-tool was still
not fast enough to deal with millions of cycles, and even if it had been, that
data wouldn't be very useful. I needed some smart heuristics. Luckily, I
(totally unintentionally, promise!) nerd sniped[5] Amir Sarabadani one evening
at the WMDE office by telling him about this problem. The next day, he told me
that he had been digging into the problem all night, and he had found a paper
that sounded relevant, and it also came with working code: "Breaking Cycles in
Noisy Hierarchies"[3] by J. Sun, D. Ajwani, P.K. Nicholson, A. Sala, and S.
Parthasarathy. I played with the code a bit, and yes! It spat out a list of 290
or so dependencies[4] that it thought were bad - and I agree for a good number
of them. It's not a clean working list, but it gives a very good idea of where
to start looking.
I find it quite fascinating that this works so well for cleaning up a codebase.
After all, the heuristic wasn't design for this - it was designed for fixing
messy ontologies. Indeed, one of their test data sets was (English language)
Wikipedia's category system! I'd love to see what it does with Wikidata's
subclass hierarchy :)
But I suppose it makes sense - dependencies in software are conceptually a lot
like an ontology, and the same strategies of stratification and abstraction
apply. And the same difficulties, too - it's easy enough to spot a problematic
cycle, but often hard to say where it should be cut. And how to cut it - often,
the solution is not to just remove the dependency, but to introduce a new
abstraction that allows the relationship to exist without a cycle. I'd love to
see the research continue in that direction!
So, a big shout out to the researchers, and to Amir who found the paper!
I hope my ramblings have made you curious to play with Ruprecht, and see what it
has to say about other code bases. There's also another feature to play with
which I haven't discussed here: detection of risky classes using the Page Rank
algorithm. Fun!
Cheers,
Daniel
[1] https://phabricator.wikimedia.org/diffusion/MTDA/repository/master/
[2]
https://gerrit.wikimedia.org/r/admin/projects/mediawiki/tools/dependency-an…
[3] https://github.com/zhenv5/breaking_cycles_in_noisy_hierarchies
[4] https://phabricator.wikimedia.org/P8513
[5] https://xkcd.com/356/
--
Daniel Kinzler
Principal Software Engineer, Core Platform
Wikimedia Foundation
Hello,
This is the weekly update from the Search Platform team for the week
starting 2019-06-10.
As always, feedback and questions are welcome.
== Discussions ==
=== Search ===
* Trey has finished his first analysis of a proposal to strip
diacritics in Slovak for searching. Slovak searchers don't always use
them, but removing them causes new problems for stemming. Read more on
MediaWiki or Phabricator —and join the discussion if you speak Slovak!
[0] [1]
* Mathew worked on creating a Cookbook to restart WDQS after applying
updates or doing a deployment [2]
* The team worked with Analytics to port usage of
mediawiki_CirrusSearchRequestSet to mediawiki_cirrussearch_request [3]
* When a new item is created, we have functionality to instant-index
it synchronously, even if with partial information, so it appears
quickly in the index - Stas worked on getting the un-redirected items
to be instant-indexed [4]
* CirrusSearch\SearcherTest::testSearchText PHPUnit tests take a long
time to run and is slowing down change processing in CI - David helped
to make it faster [5]
* It is currently not possible to evaluate all items without labels,
so we investigated adding haslabel:all and determined that matching an
_all field and a descriptions_all field worked well [6]
* We removed old settings ($wgLexemeUseCirrus and
$wgLexemeDisableCirrus) as they were both set to true for production
(no longer applicable / out of date) [7]
* David fixed an issue where suggested article are no longer shown,
API Usage for the morelike search endpoint has dropped since train
deploy of 1.34.0-wmf.7 -- by accident. [8]
* Geodata not being returned for some files - David merged in a few
patches to be sure that we don't lose CoordinatesOutput when multiple
slots are available [9]
* An old request to fix where Special:Search doesn't use labels and
descriptions for suggestions but just the item ID was worked on by
Stas and was enabled [10]
* A PHP notice in MW-Vagrant when searching with CirrusSearch was
happening in the action API and web serach interface. Thanks for
fixing it, Ottomata!
[0] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Folding_Diacritics_i…
[1] https://phabricator.wikimedia.org/T223787#5260102
[2] https://phabricator.wikimedia.org/T221832
[3] https://phabricator.wikimedia.org/T222268
[4] https://phabricator.wikimedia.org/T206256
[5] https://phabricator.wikimedia.org/T225184
[6] https://phabricator.wikimedia.org/T224611
[7] https://phabricator.wikimedia.org/T225183
[8] https://phabricator.wikimedia.org/T224879
[9] https://phabricator.wikimedia.org/T223767
[10] https://phabricator.wikimedia.org/T55652
----
Subscribe to receive on-wiki (or opt-in email) notifications of the
Discovery weekly update.
https://www.mediawiki.org/wiki/Newsletter:Discovery_Weekly
The archive of all past updates can be found on MediaWiki.org:
https://www.mediawiki.org/wiki/Discovery/Status_updates
Interested in getting involved? See tasks marked as "Easy" or
"Volunteer needed" in Phabricator.
[1] https://phabricator.wikimedia.org/maniphest/query/qW51XhCCd8.7/#R
[2] https://phabricator.wikimedia.org/maniphest/query/5KEPuEJh9TPS/#R
Yours,
Chris Koerner (he/him)
Community Relations Specialist
Wikimedia Foundation
Hi Everyone,
It's time for Wikimedia Tech Talks 2019 Episode 5!
This month's talk will take place *June 25, 2019 at 6:00 PM UTC****.*
*Topic*: Just what is Analytics doing back there?
*Speaker*: Dan Andreescu, Senior Software Engineer, Analytics
*Summary*: We take care of twelve systems. Data flows through them to
answer the many questions that our community and staff have about
our piece of the open knowledge movement. Let's take a look at how
these systems fit together and dive deeper into some of the more
interesting algorithms.
*YouTube stream for viewers*: https://www.youtube.com/watch?v=GD0PEDFysfM
During the live talk, you are invited to join the discussion on IRC
at #wikimedia-office
You can watch past Tech Talks here:
https://www.mediawiki.org/wiki/Tech_talks
If you are interested in giving your own tech talk, you can learn more
here:
https://www.mediawiki.org/wiki/Project:Calendar/How_to_schedule_an_event#Te…
Subbu.
(Standing in for Sarah Rodlund who is currently away)
In the Release Engineering team we're preparing for a new CI system.
The current one needs to be replaced. It works well, but parts of it
are getting obsolete. In particular, the Zuul version we use is
obsoleted by upstream. The new version of Zuul is entirely different.
Because of this, we are taking the opportunity to re-think the whole
approach to CI. We would like to introduce the possibility of
continuous delivery and deployment, in addition to continuous
intergration.
Earlier this year, we started a working group to evaluate candidates
for software. In phase 1, we set up some criteria for evaluation, and
considered a large number of possibilities, and winnowed the list down
to three candidates: GitLab CI, Zuul v3, and Argo. For details and a
report, see [0].
We are currently writing up what the new CI system should look like in
more detail. The approach taken is to start with what's needed and
wanted, rather than what the tools provide. The document has had a
first round of internal review, to get rid of the worst issues, and v2
is now open for feedback from the whole movement. You can find it at
[1]. Those with a WMF Google account can comment directly on the doc,
everyone else please use email, either by responding to this email via
wikitech-l or directly to me.
[0] https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Future…
[1] https://docs.google.com/document/d/1EQuInEV-eY_5kxOZ8E1qEdLr8fb6ihwOD9V_tpV…
The status of wmf.10 as of now:
* Group 1 wikis are on wmf.10 [1]
* There is one open blocker: T226448 [2]
Assuming the remaining blocker is fixed, then we will deploy wmf.10 to all
wikis tomorrow afternoon coincident with cutting the branch for wmf.11.
Tomorrow, Jeena is in charge of the train and I will be assisting. This is
Jeena's first week as train conductor and I'm not sure whether to offer
congratulations or condolences.
1. https://tools.wmflabs.org/versions/
2. https://phabricator.wikimedia.org/T226448
Sorry for cross-posting!
Reminder: Technical Advice IRC meeting this week **Wednesday 3-4 pm UTC**
on #wikimedia-tech.
Questions can be asked in English, Romanian and German!
The Technical Advice IRC Meeting (TAIM) is a weekly support event for
volunteer developers. Every Wednesday, two full-time developers are
available to help you with all your questions about Mediawiki, gadgets,
tools and more! This can be anything from "how to get started" over "who
would be the best contact for X" to specific questions on your project.
If you know already what you would like to discuss or ask, please add your
topic to the next meeting:
https://www.mediawiki.org/wiki/Technical_Advice_IRC_Meeting
Hope to see you there!
--
Raz Shuty
Engineering Manager
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Imagine a world, in which every single human being can freely share in the
sum of all knowledge. That‘s our commitment.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello,
Could somebody please help me understand the configuration of the
Wikimedia Job Runner service?
https://github.com/wikimedia/mediawiki-services-jobrunner/blob/master/jobru…
I'm not quite clear on how groups work, it would seem that the
groups/runners parameter is the number of threads assigned to each
group..? But then why does the "gwt" group have runners=0?
Thanks,
Aran
Hi,
The Wikimedia technical spaces Code of Conduct is enforced by a committee.
That committee's selection process is defined as follows:
"The first Committee will be chosen by the Wikimedia Foundation’s Technical
Collaboration team. Subsequent members and auxiliary members of the
Committee will be chosen by the current regular members through a majority
vote." [1]
About a month ago, the CoC Committee put up a slate of "candidates", and
was soliciting feedback on them. The decision on these candidates was
supposed to happen on June 12, last week. I don't know if it actually
happened - I didn't see an announcement, and the candidates page is still
up. [2] In any case, I doubt any of these candidates will have trouble
getting through, since these candidates are also, for the most part, the
people deciding who gets in.
That's what I'm writing about: I now think that the committee should be
decided via open elections, instead of having the committee appoint itself.
At the moment, this group has a complete lack of accountability: they could
make any decision whatsoever at any time, and, according to the rules,
there is literally no one who can stop them. With every passing year and
additional "renewal" (that's what it's called), [3] it seems to me that
their legitimacy as representing the views of the overall community
decreases.
I had a strange personal experience that made me start to think about this.
Pretty soon after they requested feedback a month ago, I sent en email to
the CoC Committee giving my negative view about one member of the
committee, and explaining why I thought they shouldn't remain there. The
committee responded a few weeks later by saying they were rejecting my
feedback - which is their right - but then spent the rest of the email
criticizing my own previous behavior. Which I found bizarre. Thinking about
it later, it seems to only make sense as what's known in American business
as "circling the wagons" - a group of people responding to outside
criticism in a defensive way, by rejecting all of it, attacking the
critics, etc. Which is not the kind of thing you want to see from people
who are supposed to be making rational, unbiased decisions.
Now, it could be that I'm making too much of this one interaction - maybe
some people there were just having a bad day - but there's still the larger
question of whether elections make sense, and to some extent it's a
question that's independent of whatever you think of the people currently
on the committee.
As for the mechanics of voting: one option is to give one vote to anyone
who has a Wikimedia developer account. The vote could be held on
wikitech.wikimedia.org, or perhaps there's an even better technical
solution. The key thing for now is just to get a sense for people's views
on this.
So, what do people think - is there any kind of significant support for the
idea of elections for the CoC Committee?
-Yaron
[1] https://www.mediawiki.org/wiki/Code_of_Conduct/Committee
[2]
https://www.mediawiki.org/wiki/Code_of_Conduct/Committee/Members/Candidates
[3]
https://www.mediawiki.org/wiki/Code_of_Conduct/Committee#Creation_and_renew…
On Wednesday the 19th, production wikis were rolled back from 1.34.0-wmf.10
to 1.34.0-wmf.8 due to a critical issue: "T226109 Jobs not being executed
on 1.34.0-wmf.10" [1]
It's now Thursday afternoon in the U.S. and we still do not have a fix, nor
is there an indication that a fix is imminent. It's not clear which patch
caused the issue so that leaves us with no choice but to cancel the train
for this week and try again next week. If there are critical fixes that
need to be deployed, I can help with that for the remainder of today; there
are of course no deployments on Friday.
[1] https://phabricator.wikimedia.org/T226109