It may be a lock issue. IIRC mediawiki can invoke parsoid which can then reinvoke the mediawiki api. I remember there being some corner case with locking which caused this recursive invocation to deadlock in some (not anything like production) situations. If you can get a trace from the "slow" mediawiki perhaps you will find it waiting for a lock to time out. --scott
On Jun 9, 2017 4:29 PM, "James Montalvo" jamesmontalvo3@gmail.com wrote:
Thanks to everyone for all the responses. I'm learning a lot.
In the short term we need to figure out how to make this work without RESTBase, but I've been convinced by this email chain that in the long term we'll need to incorporate RESTBase into our setup.
At this point I think I've determined that the problem we're having is not actually a Parsoid problem, but somehow related to MediaWiki Core (PHP) response times. Something about my multi-server setup is causing 25% of MW core response times to be 25x longer than normal. I didn't notice this in my dev setup, prior to testing Parsoid, probably because I just assumed my laptop was old and underpowered. In other words, normal page loads were slower but I just figured that having multiple VMs up on my laptop functioning as full app servers was the reason. Parsoid evidently has a default timeout short enough that when Parsoid makes MW core API requests I was getting failures, causing me to misinterpret it as a Parsoid issue.
To ensure it was not my underpowered laptop I moved my testing to a machine with 12 CPUs and 64 GB RAM.
Our configuration script that allows us to define our setup as follows:
load balancers = list, of, IP, addresses, ...
app servers = list, of, IP, addresses, ...
memcached servers = list, of, IP, addresses, ...
db master = a.single.ip.address
db replicas = list, of, IP, addresses, ...
parsoid servers = list, of, IP, addresses, ...
elasticsearch servers = list, of, IP, addresses, ...
I have not run it with that many servers yet, but it's theoretically possible. A single server does not need to fill a single role, so in testing thus far my configs look more like:
load balancers = server.3.ip.addr
app servers = server.1.ip.addr, server.2.ip.addr
memcached servers = server.1.ip.addr, server.2.ip.addr
db master = server.1.ip.addr
db replicas = server.2.ip.addr
parsoid servers = server.1.ip.addr, server.2.ip.addr
elasticsearch servers = server.1.ip.addr, server.2.ip.addr
In short: three servers, one exclusively a load balancer, two with everything installed albeit one acting as DB master and the other as DB replica.
We're running this setup in production with all servers configured as "localhost", e.g. everything installed on one server.
I'm pretty sure I've narrowed down the 25x-longer-response-times to being a multiple app-server problem because I can take the dev config above (server.1.ip.addr, server.2.ip.addr, server.3.ip.addr) and comment out various servers and re-run deploy. This allows me to quickly switch from a single app server to two, two DBs to one, etc. I see the issue with multiple app servers. I don't see it with a single app server, regardless of whether the other services have 1 or 2 servers.
My LocalSettings.php files are are at [1] and [2] for dual app servers. These reference Extensions.php which _shouldn't_ have any impact but can be found at [3]. The files are written by Ansible and I'm kind of bad at getting the indenting correct...so, sorry about that if it looks funny. All of this is created by our project called meza [4]. We weren't really planning on announcing meza yet, but basically its purpose is to simplify MediaWiki install with all the bells and whistles for "enterprise" (whatever that means :) ) use cases. We've been running it on a single server for about a year, but need to migrate to a high availability setup to support 24/7 mission critical operations.
Any ideas what may cause two load-balanced app servers to respond slowly 25% of the time?
Thanks!
--James
[1] https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b431 08#file-localsettings-app1-php
[2] https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b431 08#file-localsettings-app2-php
[3] https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b431 08#file-extensions-php
[4] https://github.com/enterprisemediawiki/meza
On Fri, Jun 9, 2017 at 12:57 PM, Subramanya Sastry ssastry@wikimedia.org wrote:
On 06/09/2017 09:57 AM, Gabriel Wicke wrote:
On Fri, Jun 9, 2017 at 12:56 AM, Alexandros Kosiaris <
akosiaris@wikimedia.org> wrote:
I also don't think you need RESTBase as long as you are willing to wait for parsoid to finish parsing and returning the result.
Apart from performance, there is also functionality that is missing without RESTBase:
- Diffs are going to contain a lot of extra changes (commonly called "dirty diffs"), as no original HTML or data-parsoid is available to Parsoid's selective serialization algorithm. This might make it
difficult to review changes.
What Gabriel said there about dirty diffs. So, this depends on whether wikis are concerned about their wikitext getting normalized to "Parsoid-determined canonical" formats (wrt choice of whitespaces,
quotes,
for ex.). For example, this is a extremely important for wikimedia wikis, but may be less so for some smaller wikis, if they take a one-time normalization dirty diff and adopt identical norms in source editing.
- Switching between wikitext and visual editing won't work.
This is because of the dirty-diff requirement. As far as I understand, even if wikis are okay with dirty diffs, VE's source <-> html switching functionality requires restbase right now.
- Visual editing in general will very likely stop working once we
reduce the size of HTML by separating out metadata (see https://phabricator.wikimedia.org/T78676). We keep pushing this
back
due to a lack of resources, but it is still planned, and might happen within the next six months.
There are some unresolved questions about how willing (Parsoid) clients are to work with this stripped-html format. That and the matter of us
being
resource-strapped means we keep kicking this down the road. But, when
this
happens, this will break VE-editing unless VE and Parsoid hide the
data-mw
stripping behind a config flag.
In short, using Parsoid directly for visual editing is an unsupported
configuration, and is likely to stop working altogether in the
foreseeable
future.
Just to be clear, we haven't yet made any formal decision to go down this route, but Gabriel articulates the reasons why it might make sense to do this. There are some aspects to consider here: (a) whether we want to support this combination behind a config flag at all given that some functionality may not be available (unless Parsoid clients figure out ways to support some functionality without RESTBase) (b) the complexity (maintenance, testing, documentation, support) of supporting multiple combinations.
We don't have fully resolved answers to this yet. I don't know what VE's take on this is -- so there is also that to consider. But, when we have firm resolutions on all of this, we will make suitable announcements on lists, suggest upgrade options, and update wikis.
But, also, what Gabriel said earlier about RESTBase. If you are already installing Parsoid, adding RESTBase (since it is also node.js) with the default sqlite backend might not be a whole lot more complexity. So, if VE-editing wikis that use Parsoid start adopting this, that would also inform our decisions above.
Subbu.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l