Hi!
on 2019-05-16 13:00 UTC there will be a maintenance operation in one of the
Wikimedia Foundation datacenter racks that affects 2 of our servers running
virtual machines [0]. There is a risk that this maintenance operation can result
in power loss of the servers, affecting the virtual machines running on it.
However, there is no way to know for sure if there will be any outage at all.
If you are an admin of any of the VMs in the list and you want the VM to be
reallocated into other servers previous to the operation, please get in touch
with us as soon as possible. Remember that, right now, reallocating the VM to
other server means shutting down the VM briefly.
Here is a list of affected virtual machines:
cloudvirt1028.eqiad.wmnet:
af-puppetdb01.automation-framework.eqiad.wmflabs
bastion-eqiad1-02.bastion.eqiad.wmflabs
fridolin.catgraph.eqiad.wmflabs
cloud-puppetmaster-02.cloudinfra.eqiad.wmflabs
cloudstore-dev-01.cloudstore.eqiad.wmflabs
commtech-nsfw.commtech.eqiad.wmflabs
clm-test-01.community-labs-monitoring.eqiad.wmflabs
cyberbot-exec-iabot-01.cyberbot.eqiad.wmflabs
deployment-db05.deployment-prep.eqiad.wmflabs
deployment-memc05.deployment-prep.eqiad.wmflabs
deployment-sca01.deployment-prep.eqiad.wmflabs
deployment-pdfrender02.deployment-prep.eqiad.wmflabs
ign.ign2commons.eqiad.wmflabs
integration-slave-docker-1050.integration.eqiad.wmflabs
integration-castor03.integration.eqiad.wmflabs
api.openocr.eqiad.wmflabs
osmit-umap.osmit.eqiad.wmflabs
builder-envoy.packaging.eqiad.wmflabs
jmm-buster.puppet.eqiad.wmflabs
a11y.reading-web-staging.eqiad.wmflabs
adhoc-utils01.security-tools.eqiad.wmflabs
util-abogott-stretch.testlabs.eqiad.wmflabs
canary1028-01.testlabs.eqiad.wmflabs
stretch.thumbor.eqiad.wmflabs
tools-worker-1023.tools.eqiad.wmflabs
tools-proxy-04.tools.eqiad.wmflabs
tools-docker-builder-06.tools.eqiad.wmflabs
tools-sgewebgrid-generic-0904.tools.eqiad.wmflabs
tools-sgeexec-0942.tools.eqiad.wmflabs
tools-sgeexec-0941.tools.eqiad.wmflabs
tools-sgeexec-0940.tools.eqiad.wmflabs
tools-sgeexec-0939.tools.eqiad.wmflabs
tools-sgeexec-0937.tools.eqiad.wmflabs
tools-sgeexec-0929.tools.eqiad.wmflabs
tools-sgeexec-0921.tools.eqiad.wmflabs
tools-sgeexec-0920.tools.eqiad.wmflabs
tools-sgeexec-0911.tools.eqiad.wmflabs
tools-sgeexec-0909.tools.eqiad.wmflabs
toolsbeta-proxy-01.toolsbeta.eqiad.wmflabs
vconverter-instance.videowiki.eqiad.wmflabs
perfbot.webperf.eqiad.wmflabs
wdhqs-1.wikidata-history-query-service.eqiad.wmflabs
cloudvirt1014.eqiad.wmnet:
commonsarchive-prod.commonsarchive.eqiad.wmflabs
deployment-imagescaler03.deployment-prep.eqiad.wmflabs
dumps-5.dumps.eqiad.wmflabs
dumps-4.dumps.eqiad.wmflabs
incubator-mw.incubator.eqiad.wmflabs
webperformance.integration.eqiad.wmflabs
saucelabs-01.integration.eqiad.wmflabs
integration-puppetmaster01.integration.eqiad.wmflabs
maps-puppetmaster.maps.eqiad.wmflabs
maps-wma.maps.eqiad.wmflabs
mwoffliner3.mwoffliner.eqiad.wmflabs
mwoffliner1.mwoffliner.eqiad.wmflabs
phlogiston-5.phlogiston.eqiad.wmflabs
discovery-testing-01.shiny-r.eqiad.wmflabs
snuggle-enwiki-01.snuggle.eqiad.wmflabs
canary-1014-01.testlabs.eqiad.wmflabs
tools-sgeexec-0901.tools.eqiad.wmflabs
wdqs-test.wikidata-query.eqiad.wmflabs
Toolforge won't be affected by this operation.
You can read more details about the datacenter operation itself in phabricator [1].
Sorry for the short notice,
regards.
[0] Cloud Services: reallocate workload from rack B5-eqiad
https://phabricator.wikimedia.org/T223148
[1] Install new PDUs into b5-eqiad https://phabricator.wikimedia.org/T223126
--
Arturo Borrero Gonzalez
Operations Engineer / Wikimedia Cloud Services
Wikimedia Foundation
To move maps project/home NFS and the scratch share off of the old labstore1003 machine and onto much faster, newer hardware, I’m going to begin doing rsyncs of data across.
This is just to announce this is starting soon and to encourage people to reach out on the #wikimedia-cloud channel if it is hitting performance hard on maps servers, in particular or on the scratch share.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
The 'wb_terms' table is being removed from the Wiki Replica databases.
Please see Léa Lacroix's post on the wikidata mailing list [0] for
additional details.
TL;DR summary:
* May-June 2019, the Wikidata development team will drop the wb_terms
table from the database in favor of a new optimized schema
* Migration will start on 2019-05-29
* A test system will be available starting 2019-05-15
* Details are available in Phabricator [1]
[0]: https://lists.wikimedia.org/pipermail/wikidata/2019-April/012987.html
[1]: https://phabricator.wikimedia.org/T221764
Bryan
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855
Tuesday starting at around 17:00 UTC I'm going to relocate the paws and
kubernetes masters to the new network region. While the VMs are
copying, launches of new kubernetes jobs and creation of new PAWS
notebooks will fail.
The outage should last about an hour -- less if everything goes well,
somewhat more if not. Jobs that are already running when the copy begins
should be unaffected.
Apologies for any inconvenience caused!
-Andrew
On 4/16/19 7:59 AM, Andrew Otto wrote:
> Great! Is this just for Wikitech itself or all ldap/wikitech
> authentication?
This notice is related to a change in mediawiki code, so concerns direct
logins to wikitech itself. That said, the 2fa key used by Horizon is
stored in a the wikitech database, so it's vaguely possible that Horizon
logins could be disrupted as well.
Other services that rely on ldap for account creation (e.g. gerrit,
icinga, etc.) are unaffected, although they may have unrelated
case-(in)sensitive issues of their own.
>
> On Mon, Apr 15, 2019 at 7:56 PM Bryan Davis <bd808(a)wikimedia.org> wrote:
>
>> A change was deployed to the Wikitech config 2019-04-15T23:16 UTC
>> which prevents users from logging into the wiki with a username that
>> differs in case from the 'cn' value for their developer account.
>>
>> This change is not expected to cause problems for most users, but
>> there may be some people who have historically entered a username with
>> mismatched case (for example "bryandavis" instead of "BryanDavis") and
>> relied on MediaWiki and the LdapAuthentication plugin figuring things
>> out. This will no longer happen automatically. These users will need
>> to update their password managers (or brains if they are not using a
>> password manager) to supply the username with correct casing.
>>
>> The "wrongpassword" error message on Wikitech has been updated with a
>> local override to help people discover this problem. See
>> <https://phabricator.wikimedia.org/T165795> for more details.
>>
>> Bryan, on behalf of the Cloud Services team
>> --
>> Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
>> [[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
>> irc: bd808 v:415.839.6885 x6855
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> Wikitech-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
A change was deployed to the Wikitech config 2019-04-15T23:16 UTC
which prevents users from logging into the wiki with a username that
differs in case from the 'cn' value for their developer account.
This change is not expected to cause problems for most users, but
there may be some people who have historically entered a username with
mismatched case (for example "bryandavis" instead of "BryanDavis") and
relied on MediaWiki and the LdapAuthentication plugin figuring things
out. This will no longer happen automatically. These users will need
to update their password managers (or brains if they are not using a
password manager) to supply the username with correct casing.
The "wrongpassword" error message on Wikitech has been updated with a
local override to help people discover this problem. See
<https://phabricator.wikimedia.org/T165795> for more details.
Bryan, on behalf of the Cloud Services team
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855
The OSM postgresql database service, usually accessed via osmdb.eqiad.wmnet is moving to a new server. Currently the server is a read replica of the primary database, and should be accessible via the DNS alias of osm.db.svc.eqiad.wmflabs.
As detailed here https://phabricator.wikimedia.org/T219652 <https://phabricator.wikimedia.org/T219652>, osmdb.eqiad.wmnet will be changed to point at the osm.db.svc.eqiad.wmflabs. For a brief time that will make those tables that aren’t always read-only also read-only while DNS updates. Then the replica will be promoted to the master, and the rest of the steps should not cause any impact.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
Because of some issues mounting NFS on the PAWS master, it is being rebooted. Traffic to the front page has already been routed through another node, but server creation won’t work until after the reboot is complete.
Brooke Storm
Operations Engineer
Wikimedia Cloud Services
bstorm(a)wikimedia.org <mailto:bstorm@wikimedia.org>
IRC: bstorm_
The legacy Ubuntu Trusty grid engine job grid has been shutdown!
Thanks to everyone who was involved in migrating existing tools from
the old grid to the Kubernetes cluster or the new Debian Stretch job
grid.
There were still 385 tools that may have been running jobs or
webservices on the Trusty grid at the time of shutdown. A static list
of these tools is preserved at
<https://tools.wmflabs.org/trusty-tools/>.
Instructions are still available at
<https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation>
for migrating tools that are currently down.
Bryan, on behalf of the Toolforge admin team
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855
As previously announced on this list [0][1] we are in the process of
replacing the old Ubuntu Trusty instances in Toolforge with fancy new
Debian Stretch instances.
This process is reaching its next major milestone on Monday
2019-03-25. During the general US workday on that date (14:00-00:00
UTC) the Toolforge admin team will be dismantling the legacy Ubuntu
Trusty job grid. Any tools that have not migrated to either the
Stretch grid or the Kubernetes cluster at that point will be forcibly
shutdown. Nothing will be deleted in the tools' $HOME directories, but
any Trusty grid jobs will be stopped. Any crontab file remaining on
the
old grid's cron server will be archived as
"$HOME/crontab.trusty.save". Maintainers who somehow missed all of the
announcements will be able to login and restart their tools on the
Stretch grid or Kubernetes.
See <https://wikitech.wikimedia.org/wiki/News/Toolforge_Trusty_deprecation>
for additional information and tips on common problems that have been
found thus far.
[0]: https://lists.wikimedia.org/pipermail/cloud-announce/2019-January/000122.ht…
[1]: https://lists.wikimedia.org/pipermail/cloud-announce/2019-March/000142.html
Bryan, on behalf of the Toolforge admin team
--
Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
[[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA
irc: bd808 v:415.839.6885 x6855