*What is happening*: In preparation for work upstream on the production wiki databases, the Wikireplica service needs to drop some columns from the views used by Toolforge and CloudVPS users.
The columns being dropped are:
* archive.ar_text_id
* archive.ar_content_model
* archive.ar_content_format
* revision.rev_text_id
* revision.rev_content_model
* revision.rev_content_format
NOTE: revision.rev_content_format and revision.rev_text_id are only relevant when loading serialized blobs from external storage, which is
not possible from the Wiki Replicas. These columns are removed without replacement.
These columns currently contain stale data in both the replicas and the production databases. The actual data used in production was moved
entirely to the "slot" and "content" tables on 2019-11-18 (<https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/551551/>).
Information about migrating tools to the new schema is available in the description of this task: https://phabricator.wikimedia.org/T174047
Additional information about the overall project and changes can be found here:
https://www.mediawiki.org/wiki/Multi-Content_Revisions/Database_Schema
The columns will continue to exist in the the revision_compat and archive_compat tables as a stop-gap to keep a tool that relies on those fields from completely breaking while you work on updating to the new schema. These two views are expected to perform poorly because of they include joins against the content and slots tables. Please only use those if you need them and will take longer to finish refactoring your code to the new schema.
*When is this happening*: This will take time to run across the replicas and databases, possibly over the course of a few days, beginning 2020-05-25. There is a process of depooling of the servers that will take place to allow the changes to take place.
*What should I do*: Between now and the 25th of May, stop using the fields we are removing. If you don't already, make sure you use the slots and content tables instead.
Progress on this action will be tracked on this Phabricator task - https://phabricator.wikimedia.org/T252219.
--
Brooke Storm
SRE
Wikimedia Cloud Services
bstorm(a)wikimedia.org
IRC: bstorm_
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Tomorrow I'm going to upgrade several of the OpenStack control nodes to
Debian Buster. Due to version incompatibilities, Buster and Stretch
nodes can't cooperate in the same cluster, so I will need to switch
service between clusters a couple of times.
If things go really well, this will cause only brief hiccups in the
OpenStack APIs. More likely, though, things will get a bit tangled up
and Horizon will misbehave for 20-30 minutes during the transition.
The first switchover will happen between 14:00 and 15:00 tomorrow; the
second switch will follow an hour or two later.
-Andrew
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Hi all,
Is it possible to store data into user tables through queries on
Wikireplica DBs? Or is it only possible by mysqldump'ing from the replica
DB and loading into the user table in a separate step?
I am thinking of aggregate data.
Thanks!
Hi all,
We have a set of database reports (on users, articles, etc.) that we used
to generate on a weekly basis.[1] Ever since the introduction of the *actor*
table,[2] many of the reports that have to do with users have become so
slow that the SQL query cannot finish within a reasonable time and is
killed. Some other reports have also become slower over time; all of these
are shown in red in [1].
One possible solution is to create a script which is scheduled to run once
a month; the script would download the latest dump of the wiki database,[3]
load it into MySQL/MariaDB, create some additional indexes that would make
our desired queries run faster, and generate the reports using this
database. A separate script can then purge the data a few days later.
We can use the current-version-only DB dumps for this purpose. I am
guessing that this process would take several hours to run (somewhere
between 2 and 10) and would require about 2 GB of storage just to download
and decompress the dump file, and some additional space on the DB side (for
data, indexes, etc.)
Out of abundance of caution, I thought I should ask for permission now,
rather than forgiveness later. Do we have a process for getting approval
for projects that require gigabytes of storage and hours of computation, or
is what I proposed not even remotely considered a "large" project, meaning
I am being overly cautious?
Please advise!
Huji
[1]
https://fa.wikipedia.org/wiki/%D9%88%DB%8C%DA%A9%DB%8C%E2%80%8C%D9%BE%D8%AF…
[2] https://phabricator.wikimedia.org/T223406
[3] https://dumps.wikimedia.org/fawiki/20200401/
Wikistream is a tool that has been around for the Wikimedia movement
for quite a while. It provides a web interface showing real-time
editing data across many Wikimedia projects. The data for this comes
from the IRC recent changes feeds. The webservice is written in nodejs
and currently running on the Toolforge Kubernetes cluster.
If you have experience using Toolforge and nodejs, see
<https://phabricator.wikimedia.org/T251555> and apply to become a
co-maintainer.
Bryan
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
[Again in the saga of me trying to query the revision and logging tables
against comment text and usernames...]
Am I dreaming or is the timeout on DB queries today something like 2
minutes? Is it a temporary measure? Is a query killer particularly
aggressive due to some overload? Should we expect this to last?
This query works:
MariaDB [enwiki_p]> select count(*) from revision where rev_id >
950000000 AND rev_comment_id = 1334144;
+----------+
| count(*) |
+----------+
| 174 |
+----------+
1 row in set (1 min 57.35 sec)
A slightly bigger one times out pretty quick:
MariaDB [enwiki_p]> select count(*) from revision where rev_id >
930000000 AND rev_comment_id = 1334144;
ERROR 2013 (HY000): Lost connection to MySQL server during query
Federico
cloudvirt1004 is one of our oldest generation of hypervisor servers.
The hypervisor servers are the machines which actually run the virtual
machine instances for Cloud VPS projects. This physical host is
experiencing an active hard disk and/or RAID controller failure. The
Cloud Services team is actively attempting to fix the server and
evacuate all instances running on it to other hypervisors.
See <https://phabricator.wikimedia.org/T250869> for more information
and progress updates.
The following projects and instances are affected:
* cloudvirt-canary
** canary1004-01.cloudvirt-canary.eqiad.wmflabs
* commonsarchive
** commonsarchive-mwtest.commonsarchive.eqiad.wmflabs
* deployment-prep
** deployment-echostore01.deployment-prep.eqiad.wmflabs
** deployment-schema-2.deployment-prep.eqiad.wmflabs
* incubator
** incubator-mw.incubator.eqiad.wmflabs
* machine-vision
** visionoid.machine-vision.eqiad.wmflabs
* ogvjs-integration
** media-streaming.ogvjs-integration.eqiad.wmflabs
* services
** Esther-outreachy-intern.services.eqiad.wmflabs
* shiny-r
** discovery-testing-02.shiny-r.eqiad.wmflabs
* tools
** tools-k8s-worker-38.tools.eqiad.wmflabs
** tools-k8s-worker-52.tools.eqiad.wmflabs
** tools-sgeexec-0901.tools.eqiad.wmflabs
** tools-sgewebgrid-lighttpd-0918.tools.eqiad.wmflabs
** tools-sgewebgrid-lighttpd-0919.tools.eqiad.wmflabs
* toolsbeta
** toolsbeta-sgewebgrid-generic-0901.toolsbeta.eqiad.wmflabs
* wikidata-autodesc
** wikidata-autodesc.wikidata-autodesc.eqiad.wmflabs
* wikilink
** wikilink-prod.wikilink.eqiad.wmflabs
Bryan, on behalf of the Cloud VPS admins and Cloud Services team
--
Bryan Davis Technical Engagement Wikimedia Foundation
Principal Software Engineer Boise, ID USA
[[m:User:BDavis_(WMF)]] irc: bd808
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce
Hi there!
If you use a CloudVPS web proxy, this email is for you. Toolforge
developers/users can ignore this email.
We are introducing a change to eliminate the 'X-Forwarded-For' HTTP header that
the CloudVPS web proxy adds when forwarding the HTTP request to your instance.
This header contains the original IP address of the internet client that sent
the request. This is private information that we would like to reduce in our
environment [0].
You use the web proxy if you have a public web endpoint hosted in CloudVPS under
the wmflabs.org domain. These are generally configured using Horizon in the DNS
> Web Proxies section.
Examples of web proxy names:
* accounts.wmflabs.org
* glampipe.wmflabs.org
* incubator.wmflabs.org
Full list can be seen in the Openstack Browser tool [1].
We are ready to introduce this change [2], but wanted to give some heads up for
projects that do require this information for whatever reason. We would like to
hear from you in the next couple of weeks. Please contact us in the phabricator
task [0] and include some rationale why you need the XFF header.
This is the timeline this change will follow:
* 2020-04-01: this email, start collecting list of things that require XFF
* 2020-04-07: start evaluating list of things that require XFF
* 2020-04-15: introduce the change, with proper case whitelisting
When the change is introduced, in two weeks from now, proxy backends that were
not whitelisted will stop receiving the XFF header.
Please reach out for any questions or comments.
regards.
[0] https://phabricator.wikimedia.org/T135046
[1] https://openstack-browser.toolforge.org/project/project-proxy
[2] https://gerrit.wikimedia.org/r/c/operations/puppet/+/583098
--
Arturo Borrero Gonzalez
SRE / Wikimedia Cloud Services
Wikimedia Foundation
We'll be upgrading the cloud services OpenStack install tomorrow,
beginning at 15:00 UTC.
There should be little to no interruption to VMs or Toolforge, but
Horizon logins will be disabled for part of the window.
Sorry for the short notice!
- Andrew + the WMCS team
_______________________________________________
Wikimedia Cloud Services announce mailing list
Cloud-announce(a)lists.wikimedia.org (formerly labs-announce(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud-announce