FYI, this maintenance is starting in a few minutes and will result in
some CI downtime. Gerrit will probably refuse to test or merge patches
while the work is in progress.
-------- Forwarded Message --------
Subject: Re: [Cloud-announce] Operation on Cloud VPS next monday 13th Aug
Date: Mon, 13 Aug 2018 15:30:45 +0200
From: Arturo Borrero Gonzalez <aborrero(a)wikimedia.org>
Reply-To: cloud(a)lists.wikimedia.org
Organization: Wikimedia Foundation
To: cloud-announce(a)lists.wikimedia.org
On 07/08/18 18:24, Arturo Borrero Gonzalez wrote:
> Hi!
>
> Next monday 13th we will be doing some maintenance on the main Cloud VPS
> deployment to merge the keystone service of both main and eqiad1
> deployments (the new one that we will eventually put into production).
>
> Toolforge users will not be affected by this outage.
>
> Day: Monday 13th August
> Start time: 14:00 UTC
> Finish time: 16:00 UTC or ASAP
>
> Keystone is a central point in openstack, so most horizon operations
> like login, creating/deleting VMs could be affected. On the other hand,
> VMs will keep working and we don't expect any network outage.
>
> This operation will allow us to have a smooth transition in the future
> when we move all projects and instances to the new eqiad1 deployment and
> is a previous step to having multi-region support in our Cloud VPS service.
>
> Please let us know any question or suggestions you may have.
>
Reminder, this is happening today in 30 minutes.
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Hello,
A gerrit administrator is required to review and handle
<https://phabricator.wikimedia.org/T193049>, either to ask for
clarifications or close it one way or another. Repository ownership
requests take aprox. one week; this has been sitting there since April
without virtually no activity but pings for assistance. I think the
user deserves an answer.
Thank you, M.
Recently, I was outside at night, and I was fortunate to see several
meteors. The 2018 Perseid <https://en.wikipedia.org/wiki/Perseids> meteor
shower is at its peak this weekend.
Also, yesterday I spent some time on English Wikipedia's main page
<https://en.wikipedia.org/wiki/Main_Page> and browsed some of the linked
articles. I was glad to be reminded that regardless of world events, good
or bad, we continue to create Wikipedia.
What's making you happy this week? You are welcome to write in any language.
Pine
( https://meta.wikimedia.org/wiki/User:Pine )
Greeting!
I am Tanvi Dadu, GSoC 2018 participant. For the past three months, I have
been working on Achievements Module in Commons App and
<https://phabricator.wikimedia.org/T189788#4153895>quiz
<https://phabricator.wikimedia.org/T189788#4153895>and it gives me immense
happiness to say that I have successfully completed it. Both of the
features have been released as part of 2.8 version. My works summary is
present at my MediaWiki
<https://www.mediawiki.org/wiki/User:Tanu_dadu#GSoC_WEEKLY_REPORTS>userpage
<https://www.mediawiki.org/wiki/User:Tanu_dadu#GSoC_WEEKLY_REPORTS>.
I would like to thank my Mentors ,Josephine Lim and Vivek Maskara, and the
whole Commons team for helping and encouraging me throughout the summer. It
feels very satisfying and amazing to be part of this community. I had an
amazing experience with lots of learning involved and I can't thank enough
to give this wonderful opportunity.
Regards
Tanvi Dadu
Hi,
kraz, the server running irc.wikimedia.org, will be rebooted next Monday
during the CEST morning. In all previous reboots we found most
clients/tools to reconnect automatically, but if that's not case for
something you care about, please keep an eye on it (and ideally fix it to
reconnect automatically).
Cheers,
Moritz
How did we do in our strive for operational excellence since last month?
Read on to find out!
## The month in numbers
* 2 documented incidents since July 19. [1]
* 55 Wikimedia-log-errors tasks closed after July 19. [2]
* 31 Wikimedia-log-errors tasks created after July 19. [3]
Logstash (type=mediawiki, last 7 days):
* 2,048 fatals. (channel=fatal)
* 117,372 exceptions. (channel=exception)
* 21,043 PHP errors. (channel=error)
* 6,368,647 total error-level events. (channel=*, level=ERROR)
## Highlights
### New database partition
@Josve05a reported that Special:Log was timing out on commons.wikimedia.org
for certain queries. Database administrator @Marostegui, investigated the
underlying query and found out this was caused by one of the backend
database servers having an unpartitioned 'logging' table. Manuel took the
server out of rotation for re-partitioning, which was completed later that
day.
– https://phabricator.wikimedia.org/T199790
### Disappearing audio players, mystery solved
When Étienne Beaulé (@Ebe123) found PHP-Notice errors in the Score
extension, they immediately investigated. It began as the fixing of a typo
that caused
inefficient (but working) parsing of audio data. Upon closer inspection, a
bigger story was uncovered. The computation of audio lengths was being
skipped due to a mismatch in MIME-types between Score and
TimedMediaHandler. The player needs this length, and as a result, browsers
had to download and parse the audio data entirely client-side, creating a
delay of 5-20 seconds or more.
Four months earlier, Andre reported that pressing play on an audio player,
made the player disappear for a long time.
It all makes sense now.
– https://phabricator.wikimedia.org/T192550 /
https://phabricator.wikimedia.org/T200835
### Packet loss
After noticing that exception IDs from error pages were not found in
Logstash, Tim Starling started an investigation. He created a new Grafana
dashboard and the culprit was quickly identified. Over 3000 packets were
being dropped, every second. That's over 90% of server logs – missing!
14 deployments, 9 SAL entries, and 6 days later, we finally reached 0%
packet loss.
Many thanks to Filippo Giunchedi, @BBlack, @herron, @Gehel who got to the
bottom of this.
Our weekly error numbers increased 100X since last month, and.. that's a
good thing!
–
https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&from=1530097200…
– https://phabricator.wikimedia.org/T200960
### Vips or no Vips
We use the VipsScaler extension to create thumbnails of large TIFF and PNG
files in some cases. Test requests for it failed with "10.2.1.21 port 80:
Connection refused". The error was puzzling because the IP does not belong
to MediaWiki or an image-scaling service. Rather, it belongs to Proton, a
Chromium PDF service.
Investigation from @MoritzMuehlenhoff, @Reedy, and others revealed the
service IP used by Proton since June 2018 previously belonged to the
mediawiki-imagescaler pool (dissolved in April 2018). Configuration for
VipsScaler was outdated and stopped working in April. The issue was not
noticed until the IP address started working again, with an unrelated
service producing errors.
– https://phabricator.wikimedia.org/T199937 /
https://phabricator.wikimedia.org/T199938
## Higher impact
These cause users (of web or api) to see errors.
New:
* [ProofreadPage extension] https://phabricator.wikimedia.org/T201506 -
MWContentSerializationException: The serialization is an invalid JSON array.
* [Flow extension] https://phabricator.wikimedia.org/T201654 -
InvalidArgumentException
"The Title object yields no ID" from Flow\LinksTableUpdater.
* [MediaWiki-Logging] https://phabricator.wikimedia.org/T201411 - Date
input on Special:Log can cause fatal error.
Carried over:
* [Page deletion] https://phabricator.wikimedia.org/T195692 - Undelete for
certain pages aborted by IncompleteRevisionException.
* [AbuseFilter extension] https://phabricator.wikimedia.org/T187153 -
Special:Abuselog throws BadMethodCallException on details/examine.
* [Flow extension] https://phabricator.wikimedia.org/T70526 -
InvalidDataException "Flow workflow is for different page".
* [MobileFrontend] https://phabricator.wikimedia.org/T199066 -
Special:MobileContributions shows "Special:Badtitle" (Revision::ensureTitle
error).
## Noise
These are caused by code behaving unexpectedly, but with limited impact due
to graceful recovery by PHP, or other handling. These harm our ability to
detect and prevent higher impact issues (through Scap and Fatal-Monitor),
and may be masking other issues.
New:
* [FileImporter extension] https://phabricator.wikimedia.org/T200837 - PHP
Notice: Undefined index from WikiTextContentCleaner.php.
* [PagedTiffHandler] https://phabricator.wikimedia.org/T200839 - PHP
Notice: Undefined index from PagedTiffHandler_body.php.
Carried over: None!
All of last month's noise mentions were fixed! 🎉
## Thank you
Thank you to everyone for helping investigate/resolve #Wikimedia-log-errors.
Including:
* Jdforrester-WMF (James D. Forrester)
* matmarex (Bartosz Dziewoński)
* Marostegui (Manuel Aróstegui)
* zeljkofilipin (Željko Filipin)
* Ebe123 (Étienne Beaulé)
* jcrespo (Jaime Crespo)
* dcausse (David Causse)
* Jdlrobson (Jon Robson)
* Addshore (Adam_WMDE)
* EBjune (Erika Bjune)
* Anomie (Brad Jorsch)
* Aaron (Aaron Schulz)
* Reedy (Sam Reed)
Thanks!
Until next time,
-- Timo Tijhof
[1]
https://wikitech.wikimedia.org/w/index.php?title=Category:Incident_document…
[2] https://phabricator.wikimedia.org/maniphest/query/h1j5IXlqAUPJ/#R
[3] https://phabricator.wikimedia.org/maniphest/query/MtotJEtlSU5_/#R
Hello,
I'd like to request advice on handling a task at hand.
It results that Extension:ArticleToCategory2
<https://github.com/wikimedia/mediawiki-extensions-ArticleToCategory2>
is using some sort of old-fashioned way to name their user rights (ie:
ArticleToCategory2AddCat instead of all lowercase; or
ArticleToCategory2 which is somewhat confusing). I'd like to fix that,
but I was wondering if that'd cause undue complications in existing
installs and, if so, if it should go first via a deprecation process
or other process I am not aware off.
Thanks in advance for any help or advice you can offer.
Best regards, M.