Hello!
I just added a short paragraph about the switch of CirrusSearch to
HTTPS to our weekly update page [1]. I kept it pretty short as it is
mainly an invisible technical change, but still, that's my first real
achievement here, so I'm happy to share it.
I did not add that I caused a 5 minute outage while deploying this
change. I'm not trying to hide (I did hide an incident report [2]),
but I don't feel the weekly update is the right place for this kind of
communication. I'm happy to add it as well if you think otherwise.
Let me know...
MrG
[1] https://www.mediawiki.org/wiki/Wikimedia_Discovery#Updates
[2] https://wikitech.wikimedia.org/wiki/Incident_documentation/20160407-Mediawi…
--
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
On Thu, Apr 7, 2016 at 10:24 PM, Bryan Davis <bd808(a)wikimedia.org> wrote:
> On Thu, Apr 7, 2016 at 11:23 AM, Guillaume Lederrey
> <glederrey(a)wikimedia.org> wrote:
>> * labswiki is running on Zend, not HHVM, I need to remember that and
>> to try to understand why
>
> Mostly because we have never tested all of the MediaWiki stack used by
> wikitech (labswiki) on HHVM. That wiki is special in several ways
> (ldap auth, semantic mediawiki) from any other wiki in the wiki farm.
> In theory it should be able to run on hhvm; in practice I wouldn't be
> surprised to find out that some extensions it uses have issues on
> hhvm. I think that Andrew has a test wiki for wikitech now, so we may
> be able to try out hhvm without killing wikitech to do it.
>
I think that we specifically know that semantic mediawiki and
OpenStackManager are not supported by HHVM.
G.
--
Giuseppe Lavagetto, Ph.d.
Senior Technical Operations Engineer, Wikimedia Foundation
On Thu, Apr 7, 2016 at 10:24 PM, Bryan Davis <bd808(a)wikimedia.org> wrote:
> On Thu, Apr 7, 2016 at 11:23 AM, Guillaume Lederrey
> <glederrey(a)wikimedia.org> wrote:
>> * labswiki is running on Zend, not HHVM, I need to remember that and
>> to try to understand why
>
> Mostly because we have never tested all of the MediaWiki stack used by
> wikitech (labswiki) on HHVM. That wiki is special in several ways
> (ldap auth, semantic mediawiki) from any other wiki in the wiki farm.
> In theory it should be able to run on hhvm; in practice I wouldn't be
> surprised to find out that some extensions it uses have issues on
> hhvm. I think that Andrew has a test wiki for wikitech now, so we may
> be able to try out hhvm without killing wikitech to do it.
Reasonable explanation. Thanks for enlightening me!
>
> Bryan
> --
> Bryan Davis Wikimedia Foundation <bd808(a)wikimedia.org>
> [[m:User:BDavis_(WMF)]] Sr Software Engineer Boise, ID USA
> irc: bd808 v:415.839.6885 x6855
--
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
A few notes about the switch, in no particular order. I probably
missed a few points, feel free to add your own.
Timeline:
09:35: deploy sizing of HHVM curl named pools
12:22: activating HTTPS + connection pooling, but staying on eqiad
12:37: stop using HTTPS+pooling for labswiki
12:46: point the codfw label back to the codfw cluster
13:22: Fix TTMServer elastic config
13:50: switch CirrusSearch traffic to codfw
13:53: rollback
14:35: switch mw1017 to codfw for CirrusSearch
14:47: switch all CirrusSearch traffic to codfw
Issues found and fixed:
* labswiki does not run on HHVM >= 3.9.0 but on zend, search is
broken. Fixed by an exception in configuration
(https://gerrit.wikimedia.org/r/#/c/282145/). Making the
CirrusSearch\Elastica\PooledHttp class more robust would be nice, but
might not be possible - https://phabricator.wikimedia.org/T132075)
* All traffic (including updates) sent to eqiad : copy/paste error in
wmf-config/CirrusSearch-production.php, fixed by
https://gerrit.wikimedia.org/r/#/c/282147/. We lost some updates,
re-indexing in progress.
* Issue with TTM configuration broken by the change in CirrusSearch
config. Fixed by https://gerrit.wikimedia.org/r/#/c/282154/.
* All wikis in error for ~5 minutes. Issue in handling of array in
CirrusSearch configuration. Fixed by
https://gerrit.wikimedia.org/r/#/c/282163/1.
Issues discovered but not fixed:
* TTM does not handle multi DC
- writes are done only to eqiad
- which implies ttmserver index does not exist in codfw
- in particular, saves do not work, which makes fixing this a
blocker for the switch
- dcausse and Nikerabbit seem to have a quick fix in mind
- phab task created: https://phabricator.wikimedia.org/T132076
General lessons:
* Elasticsearch as measured from mediawiki has a response time
increase of ~50[ms] for all query types, except MoreLike queries
(which were already sent to codfw). MoreLike queries have a response
time decrease of ~15[ms]. Those differences seem to be fairly constant
(similar across all percentiles). With a completey biased and
unscientific experiment of using wikipedia myself, I feel those
differences the most on the autocompletion of the search box.
* unit testing configuration is hard, testing it outside of prod is
mostly impossible
* testing first be deploying manually on our tests servers (mw1017,
...) definitely make sense for all non trivial changes
* labswiki is running on Zend, not HHVM, I need to remember that and
to try to understand why
* I know remember why I like strongly statically typed languages
--
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
<quote name="Moritz Muehlenhoff" date="2016-03-22" time="22:11:58 +0100">
> On Tue, Mar 22, 2016 at 10:04:21PM +0100, Guillaume Lederrey wrote:
> > Let me know if you have any question or if you know of another place
> > where I should publicize this deployment window.
>
> We have a page on wikitech which tracks all deployment/maintenance
> windows: https://wikitech.wikimedia.org/wiki/Deployments
Yes, please add your window there. As I see you want a 7 hour window,
overlapping with other windows shouldn't be a problem. But, if you ever
plan to do work that might impact more than just your own systems,
please do coordinate with me.
Thanks,
Greg
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |
Hello!
I'm trying to make sure the hardware requests needed for Maps are
tracked correctly. What I see so far:
* Maps back end hardware [1]
- seems to miss a specification of what hardware is required.
@Yuri, could you have a look?
- should be moving forward as soon as we document what kind of
hardware we need.
* Maps hardware planning for FY16/17 [2]
- fairly high level description of needs to FY16/17 AND FY15/16
* Set up proper edge Varnish caching for maps cluster [3]
- describes requirements for proper caching (seems to require 4
machines in each 4 datacenter, so 16 machines)
- it seems that it is possible to reuse mobile cache machines, but
it is presented as a possibility, unclear if any action has been taken
in this direction
- I need to dig into this one to see if anything needs to be done
Am I missing anything?
Note: only [1] is tagged as [hardware-requests].
[1] https://phabricator.wikimedia.org/T131180
[2] https://phabricator.wikimedia.org/T125126
[3] https://phabricator.wikimedia.org/T109162
--
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation
Hello!
I started writing some minimal documentation on what SonarQube is and
how you can play with it [1]. I will do a short presentation at next
Credit meeting [2] (Wednesday April 6th).
What I need now is someone else than me starting to play with it and
telling me if this make sense in the Wikipedia context and start a
conversation around it. Let me know if you are interested!
Cheers,
Guillaume
[1] https://wikitech.wikimedia.org/wiki/SonarQube
[2] https://etherpad.wikimedia.org/p/CREDIT
--
Guillaume Lederrey
Operations Engineer, Discovery
Wikimedia Foundation