Re: [Wikitech-l] Setting up multiple Parsoid servers behind load balancer

10 Jun 2017


      It may be a lock issue. IIRC mediawiki can invoke parsoid which can then
reinvoke the mediawiki api.  I remember there being some corner case with
locking which caused this recursive invocation to deadlock in some (not
anything like production) situations.  If you can get a trace from the
"slow" mediawiki perhaps you will find it waiting for a lock to time out.
  --scott
On Jun 9, 2017 4:29 PM, "James Montalvo" jamesmontalvo3@gmail.com wrote:
...
Thanks to everyone for all the responses. I'm learning a lot.
In the short term we need to figure out how to make this work without
RESTBase, but I've been convinced by this email chain that in the long term
we'll need to incorporate RESTBase into our setup.
At this point I think I've determined that the problem we're having is not
actually a Parsoid problem, but somehow related to MediaWiki Core (PHP)
response times. Something about my multi-server setup is causing 25% of MW
core response times to be 25x longer than normal. I didn't notice this in
my dev setup, prior to testing Parsoid, probably because I just assumed my
laptop was old and underpowered. In other words, normal page loads were
slower but I just figured that having multiple VMs up on my laptop
functioning as full app servers was the reason. Parsoid evidently has a
default timeout short enough that when Parsoid makes MW core API requests I
was getting failures, causing me to misinterpret it as a Parsoid issue.
To ensure it was not my underpowered laptop I moved my testing to a machine
with 12 CPUs and 64 GB RAM.
Our configuration script that allows us to define our setup as follows:
load balancers = list, of, IP, addresses, ...
app servers = list, of, IP, addresses, ...
memcached servers = list, of, IP, addresses, ...
db master = a.single.ip.address
db replicas = list, of, IP, addresses, ...
parsoid servers = list, of, IP, addresses, ...
elasticsearch servers = list, of, IP, addresses, ...
I have not run it with that many servers yet, but it's theoretically
possible. A single server does not need to fill a single role, so in
testing thus far my configs look more like:
load balancers = server.3.ip.addr
app servers = server.1.ip.addr, server.2.ip.addr
memcached servers = server.1.ip.addr, server.2.ip.addr
db master = server.1.ip.addr
db replicas = server.2.ip.addr
parsoid servers = server.1.ip.addr, server.2.ip.addr
elasticsearch servers = server.1.ip.addr, server.2.ip.addr
In short: three servers, one exclusively a load balancer, two with
everything installed albeit one acting as DB master and the other as DB
replica.
We're running this setup in production with all servers configured as
"localhost", e.g. everything installed on one server.
I'm pretty sure I've narrowed down the 25x-longer-response-times to being a
multiple app-server problem because I can take the dev config above
(server.1.ip.addr, server.2.ip.addr, server.3.ip.addr) and comment out
various servers and re-run deploy. This allows me to quickly switch from a
single app server to two, two DBs to one, etc. I see the issue with
multiple app servers. I don't see it with a single app server, regardless
of whether the other services have 1 or 2 servers.
My LocalSettings.php files are are at [1] and [2] for dual app servers.
These reference Extensions.php which _shouldn't_ have any impact but can be
found at [3]. The files are written by Ansible and I'm kind of bad at
getting the indenting correct...so, sorry about that if it looks funny. All
of this is created by our project called meza [4]. We weren't really
planning on announcing meza yet, but basically its purpose is to simplify
MediaWiki install with all the bells and whistles for "enterprise"
(whatever that means :) ) use cases. We've been running it on a single
server for about a year, but need to migrate to a high availability setup
to support 24/7 mission critical operations.
Any ideas what may cause two load-balanced app servers to respond slowly
25% of the time?
Thanks!
--James
[1]
https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b431
08#file-localsettings-app1-php
[2]
https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b431
08#file-localsettings-app2-php
[3]
https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b431
08#file-extensions-php
[4] https://github.com/enterprisemediawiki/meza
On Fri, Jun 9, 2017 at 12:57 PM, Subramanya Sastry ssastry@wikimedia.org
wrote:
...
On 06/09/2017 09:57 AM, Gabriel Wicke wrote:
On Fri, Jun 9, 2017 at 12:56 AM, Alexandros Kosiaris <
...
akosiaris@wikimedia.org> wrote:
...
I also don't think you need RESTBase as long as you are willing to
wait for parsoid to finish parsing and returning the result.
Apart from performance, there is also functionality that is missing
without
RESTBase:
- Diffs are going to contain a lot of extra changes (commonly called
"dirty diffs"), as no original HTML or data-parsoid is available to
Parsoid's selective serialization algorithm. This might make it

difficult
    to review changes.
What Gabriel said there about dirty diffs. So, this depends on whether
wikis are concerned about their wikitext getting normalized to
"Parsoid-determined canonical" formats (wrt choice of whitespaces,
quotes,
...
for ex.). For example, this is a extremely important for wikimedia wikis,
but may be less so for some smaller wikis, if they take a one-time
normalization dirty diff and adopt identical norms in source editing.
- Switching between wikitext and visual editing won't work.

...
This is because of the dirty-diff requirement. As far as I understand,
even if wikis are okay with dirty diffs, VE's source <-> html switching
functionality requires restbase right now.
- Visual editing in general will very likely stop working once we

...
reduce
    the size of HTML by separating out metadata (see
    https://phabricator.wikimedia.org/T78676). We keep pushing this
back
...
...
due
    to a lack of resources, but it is still planned, and might happen
within
    the next six months.
There are some unresolved questions about how willing (Parsoid) clients
are to work with this stripped-html format. That and the matter of us
being
...
resource-strapped means we keep kicking this down the road. But, when
this
...
happens, this will break VE-editing unless VE and Parsoid hide the
data-mw
...
stripping behind a config flag.
In short, using Parsoid directly for visual editing is an unsupported
...
configuration, and is likely to stop working altogether in the
foreseeable
...
...
future.
Just to be clear, we haven't yet made any formal decision to go down this
route, but Gabriel articulates the reasons why it might make sense to do
this. There are some aspects to consider here:
(a) whether we want to support this combination behind a config flag at
all given that some functionality may not be available (unless Parsoid
clients figure out ways to support some functionality without RESTBase)
(b) the complexity (maintenance, testing, documentation, support) of
supporting multiple combinations.
We don't have fully resolved answers to this yet. I don't know what VE's
take on this is -- so there is also that to consider. But, when we have
firm resolutions on all of this, we will make suitable announcements on
lists, suggest upgrade options, and update wikis.
But, also, what Gabriel said earlier about RESTBase. If you are already
installing Parsoid, adding RESTBase (since it is also node.js) with the
default sqlite backend might not be a whole lot more complexity. So, if
VE-editing wikis that use Parsoid start adopting this, that would also
inform our decisions above.
Subbu.

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Setting up multiple Parsoid servers behind load balancer