Hello,
The WMF’s technology department has for this quarter the goal of testing
and temporarily switching the main operational data centre from Eqiad
(located in Chicago) to Codfw (located in Dallas)~[1,2]. This includes both
back-end-processing as well as serving live traffic from it.
As a part of this effort, we are scheduling a switch-over for RESTBase and
its back-end services, including: Parsoid, the Mobile Content Service,
CXServer, Mathoid, Citoid, Apertium and Zotero~[3]. Technically, it will
not be a real switch-over per se, because we will keep all of those
services active in both DCs. However, external traffic will be directed to
the Dallas DC only.
=== When is it and what does it mean for me? ===
The switch-over test is planned for this Thursday, 2016-03-17. We have
allotted a three-hour window for this~[4]. There is nothing users should
do before or after the switch; it will be transparent for them. There are
two things users should note, though:
1) At the time of the switch-over, users might receive error responses for
a while (both 4xx and 5xx status codes). While we will test most of the
things ahead of time, we cannot test the actual traffic shifting, so small
bumps might be noticed.
2) After the switch to the Dallas DC, users will likely see their response
latencies slightly elevated. During the test, some requests might
experience a slightly larger latency. This will occur because all of the
services that will be responding to live requests still need to contact the
main MediaWiki cluster, which will remain in Eqiad (the other DC) until a
complete switch-over of the infrastructure is performed. However, given the
multiple levels of caching, the 40 ms of penalty to go cross-DC for an
uncached API request does not seem too taxing.
=== Wait, what about my service X running in WMF production? ===
If you are a service owner of one the aforementioned services, there are no
explicit actions you should take prior to, during or after the switch-over
test. This test could, however, affect your service depending on whether it
usually serves live traffic or is mostly operational during various
internal updates. MediaWiki and JobQueue processing will still be performed
in Eqiad, so in the latter case your service should not see a change in the
usage pattern. If, however, your service is mostly in charge of responding
to live requests coming through RESTBase, those will be handled by
instances in Codfw. However, as these services are full replicas of their
Eqiad counterparts and are stateless, no major breakage will happen.
Should you have any questions or concerns, don’t hesitate to contact us
here or on IRC (#wikimedia-services @ freenode).
Best,
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
[1]
https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q3_Goals#Techn…
[2] https://phabricator.wikimedia.org/project/profile/1723/
[3] https://phabricator.wikimedia.org/T127974
[4]
https://wikitech.wikimedia.org/wiki/Deployments#Thursday.2C.C2.A0March.C2.A…
Hello,
On 8 March 2016 at 02:49, Gabriel Wicke <gwicke(a)wikimedia.org> wrote:
> tl;dr: You are *very* likely not affected.
>
> We are planning two changes in the REST API:
>
> 1) Remove the experimental /page/html/ and /page/data-parsoid/
> listings [1][2]. Our metrics show that these are essentially unused.
> The same title listing remains available at /page/title/ [3].
>
> 2) Make the `tid` path parameter in the unstable
> /page/data-parsoid/{title}/{revision}/{tid} [4] end point mandatory.
> Data-parsoid is tied to a specific HTML render, and only requests with
> an explicit timeuuid from the corresponding HTML response are
> guaranteed to get the correct data-parsoid version.
>
These changes are scheduled to go live tomorrow around 10:00 UTC.
Cheers,
Marko
>
> If things go to plan, we will deploy these changes sometime next week.
>
> Thank you for your understanding,
>
> Gabriel Wicke for the Services team
>
> [1]:
> https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_html
> [2]:
> https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_data_pars…
> [3]:
> https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_title
> [4]:
> https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_data_pars…
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation
Hello,
We have updated the nodejs package version from 4.2.4 to 4.3.0 on the SCB
service cluster running in production. The services running there have been
restarted and tested. I have also taken the liberty to update the build
portion of the package.json's in the source repositories so that the
respective build systems create node dependencies compiled with the correct
version of the node binary. The affected services and their respective
patch-sets are:
- citoid: https://gerrit.wikimedia.org/r/#/c/274688/
- cxserver: https://gerrit.wikimedia.org/r/#/c/274685/
- graphoid: https://gerrit.wikimedia.org/r/#/c/274687/
- mathoid: https://gerrit.wikimedia.org/r/#/c/274684/
- mobileapps: https://gerrit.wikimedia.org/r/#/c/274689/
Note that because of these changes, the next time you build the deploy
repository it will take a while longer since the new version of Node.js
needs to be downloaded and configured in the service's container image. The
good news is that you don't have to do anything but wait a bit :)
Thanks to Moritz Mühlenhoff from Ops for making this happen!
Cheers,
Marko
--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation