A few questions to provoke discussion/share knowledge better:
* Why does the train run Tue,Wed, Thur rather than Mon,Tue,Wed
* Why do we only have 2 group 1 Wikipedia's (Catalan and Hebrew)
* Should there be a backport window Friday mornings for certain changes?
A few weeks ago a change I made led to a small but noticeable UI
regression. The site was perfectly usable, but looked noticeably off. It
was in a more obscure part of the UI so we missed it during QA/code review.
Late Wednesday a ticket was reported against Wikimedia commons, but I only
became aware of it late Thursday when the regression rolled out to English
Wikipedia. A village pump discussion was started and several duplicate
tickets were created. While the site could still be used it didn't look
great and upset the experience of many editors.
Once aware of the problem, the issue was easy to fix. A patch was written
I understand Friday backports are possible, but my team tend to use them as
a last resort in fear of creating more work for my fellow maintainers over
weekend periods. As a result, given the site was still usable, the fix
wasn't backported until the first available backport window on Monday. This
is unfortunately a regular pattern, particularly for small UI regressions.
We addressed the issue on Monday, but I got feedback from several users
that this particular issue took too long to get backported. I mentioned the
no Friday deploy policy. One user asked me why we don't run the train
Monday-Wednesday and to be honest I wasn't sure. I couldn't find anything
My team tries to avoid big changes on Mondays as Monday merged patches are
more likely to have issues since they don't always get the time to go
through QA during the week by our dedicated QA engineer.
So... Why don't we run the train Monday-Wednesday? Having a Thursday buffer
during which we can more comfortably backport any issues not caught in
testing, particularly UI bugs would be extremely helpful to my team and I
don't think we'd lose much by losing the Monday to rush last-minute changes.
Assuming there are good reasons for Tuesday-Thursday train, I think there
is another problem with our deploy process which is the size of group 1.
Given the complexity of our interfaces (several skins, gadgets, multiple
special pages, user preferences, gadgets, multiple extensions, and
different user rights), generally, many obscure UI bugs get missed in QA by
people who don't use the software every day and have a clear mental model
of how it looks and behaves. My team mostly works on visible user interface
changes and we rely heavily on Catalan and Hebrew Wikipedia users - our
only group 1 wikis to notice errors with UI before they go out to a wider
audience. Given the size of those audiences, that often doesn't work, and
it's often group 2 wikis that make us aware of issues. If we are going to
keep the existing train of Tue-Thur, I think it's essential we have at
least one larger Wikipedia in our group 1 deploy to give us better
protection against UI regressions living over the weekend. My understanding
is for some reason this is not a decision release engineering can make, but
one that requires an on-wiki RFC by the editors themselves. Is that
correct? While I can understand the reluctance of editors to experience
bugs, I'd argue that it's better to have a bug for a day than to have it
for an entire weekend, and definitely something we need to think more
I have been thinking of a way to organise data in Wiktionary that would allow
for words to automatically show translations to other languages with much less
work than is currently required.
Currently, translations to other languages have to be added manually, meaning
they are not automatically propagated across language pairs. What I mean by
this is showcased in the following example:
1. I create a page for word X in language A.
2. I create a page for word Y in language B.
3. I add a translation to the page for word X, and state that it translates to
word Y in language B.
4. If I want the page for word Y to show that it translates to word X in
language A, I have to do this manually.
Automating this seems a bit tricky. I think that the key is acknowledging that
meanings can be separated from language and used as the links of translation.
In this view, words and their definitions are language-specific, but meanings
Because I may have done a bad job at explaining this context, I have created a
short example in the form of an sqlite3 SQL script that creates a small
dictionary database with two meanings for the word "desert"; one of the
meanings has been linked to the corresponding words in Spanish and in German.
The script mainly showcases how words can be linked across languages with
You can find the script attached. To experiment with this, simply run
within an interactive sqlite3 session. (There may be other ways of doing it
but this is how I tested it.)
I believe this system can also be used to automate other word relations such as
hyponyms and hypernyms, meronyms and holonyms, and others. It can also allow
looking up words in other languages and getting definitions in the language of
choice. In short, it would allow Wiktionary to more effortlessly function as
a universal dictionary.
Has something like this been suggested before? I would be pleased to receive
feedback on this idea.
With kind regards,
wikibugs needs to be autovoiced in all the IRC channels it speaks in to
avoid being killed by antispam bots.
If wikibugs speaks in your channel and isn't voiced, please have a
channel founder autovoice it:
/msg chanserv flags <#channel> wikibugs +Vv
<https://phabricator.wikimedia.org/T283983> has a list of channels and
the people with appropriate permissions should've been pinged on it.
If your IRC channel isn't in use anymore please file a bug (or submit a
patch!) to have wikibugs removed from it.
Today we switched over most services and traffic caches from the eqiad
(Virginia) datacenter to codfw (Texas) as part of improving our
reliability. The goal is to have this procedure working and regularly
tested in case of an emergency when we actually need it.
We're only aware of one user-facing impact, for a short time WDQS lag
detection was broken, affecting Wikidata bots that check it. This is
tracked as <https://phabricator.wikimedia.org/T285710>.
Users will experience a bit of a latency increase for now as most user
traffic will need to talk to both eqiad and codfw datacenters. This will
go away tomorrow once MediaWiki is switched over (keep reading).
Also, we were a bit delayed in starting today because of an issue
causing appservers to get stuck:
== Services ==
Started at 14:29 UTC, officially finished at 15:09.
The main issues we ran into were:
* the helm-charts service is unique and doesn't have a service IP,
causing the automatic switchover verification to break. This required us
to manually check the other services that come after it in the list, and
then re-run cookbook while excluding it. Tracked as
* the restbase-async service has some special handling, which we debated
on whether to follow that or not, opted to not special case it. Figuring
out what to do long-term is <https://phabricator.wikimedia.org/T285711>.
* the WDQS issue mentioned earlier.
== Traffic ==
Started at 15:43, finished at 15:45.
It took until ~16:25 for eqiad to mostly depool. There's not much else
to report, it went very smoothly.
== Tomorrow's MediaWiki switchover ==
Scheduled for 14:00 UTC <https://zonestamp.toolforge.org/1624888854>.
It is our goal to minimize the read-only time and make this a non-event
from a user perspective.
All of the coordination will take place in the #wikimedia-operations IRC
channel on Libera Chat You're more than welcome to follow along but if
you have questions, please ask them in #wikimedia-tech so it doesn't get
disruptive. The procedure that we'll be following is documented at
I'm planning to do one more "live test" later today, will announce that
on IRC when it gets started.
The SelectQueryBuilder class was introduced a year ago
and it's seen some adoption, mostly in core. I've also noticed that
there are two classes that extend it in core, the PageSelectQueryBuilder
and UserSelectQueryBuilder. I really like how this approach allows for
better separation of DB code from the rest, which has always been a
complete and utter mess in MW. I would like to do something similar in
an extension, but the base class is currently not explicitly marked as
stable to extend and thus, according to the stable interface policy
the interface could be broken at any time. My question is – are there
any plans to make the class stable to extend? Is there a rough roadmap
for its development? If the class is unstable to extend for some reason
– what are the expected changes to come?
If you don't do anything with metadata fields of file tables (image table
for example) in replicas, you can ignore this email.
"image" table in Wikimedia Commons is extremely big (more than 380GB
compressed) and has been causing multiple major issues (including an
incident recently). Deep inspections revealed that more than 80% of this
table is metadata of PDF files, around 10% is metadata of DjVu files and
the 10% left is the rest of the information. This clearly needs fixing.
The work has been done on this by Tim Starling and we are slowly rolling
out two major changes:
First, format of metadata in the database (for example img_metadata field
in image table) will change for all files. It used to be php serialization
but it will be changed to json. You can see an example of before and after
in https://phabricator.wikimedia.org/T275268#7178983 Keep it in mind that
for some time this will be a hybrid mode that some files will have it in
json format and some will have it in php serialization. You need to support
both formats for a while if you parse this value.
Second, some parts of metadata for PDF and later DjVu files won't be
accessible in Wikimedia Cloud anymore. Since these data will be moved to
External Storage and ES is not accessible to the outside. It's mostly OCR
text of PDF files. You can still access them using API
Nothing to the outside users will change, the API will return the same
result, the user interface will show the same thing but it would make all
of Wikimedia Commons more reliable and faster to access (by indirect
changes such as improving InnoDB buffer pool efficiency), improves time to
take database backups, enables us to make bigger changes on image table and
improve its schema and much more.
I hope no one heavily relies on the img_metadata field in the cloud
replicas but if you do, please let me know and reach out for help.
You can keep track of the work in https://phabricator.wikimedia.org/T275268
Thank you for understanding and sorry for any inconvenience.
Platform Engineering Team is going to make breaking changes to the MediaWiki core CentralIdLookup class 
We are changing the accepted types for the $user parameter from User to UserIdentity. This should not affect
any callers, but any class that extends CentralIdLookup will be broken. The only extensions known to us that
implement CentralIdLookup are CentralAuth and Wikibase, and we are updating them alongside with the core class.
If you maintain an extension which implements a custom CentralIdProvider, please respond to this email
and we will either help you update your extension, or work out a new plan for converting the core class.
Additionally, we are loosening the return type guarantees for CentralIdLookup::localUserFromCentralId from
User to UserIdentity. This could potentially affect the callers, but we’ve updated all extensions to be compatible
with the new guarantees. If you maintain an extension that is not available on codesearch please reach out to us.
Best regards. Petr.
Staff Software Engineer.
Platform Engineering Team.
1. https://gerrit.wikimedia.org/r/c/mediawiki/core/+/700991 <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/700991>
2. https://codesearch.wmcloud.org/extensions/?q=localUserFromCentralId&i=nope&… <https://codesearch.wmcloud.org/extensions/?q=localUserFromCentralId&i=nope&…>
Lots of people thanked me for deploying mailman3 but I want to mention that
it would have not been possible without Wikimedia Cloud Services team
giving a lot of resources to me so I could have a test setup and puppetize
mailman3 easily which in turn made deployment to production much easier.
Thank you for providing such a critical infrastructure to us! Keep up the