Follow on to previous email chain improperly named "Setting up multiple Parsoid servers behind load balancer".
I'm getting much slower response times in a setup with multiple app servers behind an HAProxy load balancer, versus the same setup with just a single app server behind the same load balancer. I've setup profiling per recommendations from this mailing list. [1] is the call graph of a particularly long request. [2] is a graph showing requests over many page loads, with the better-performing yellow dots/line being the single app server. The worst-performing color is with profiling turned on.
This gist [3] has my LocalSettings.php from both app servers and the included Extensions.php.
Can anyone help me figure this out? Anything else I can provide or certain things I should test?
Thanks, James
[1] https://gist.githubusercontent.com/jamesmontalvo3/5adf207623454c9eff98e93152...
[2] https://gist.githubusercontent.com/jamesmontalvo3/5adf207623454c9eff98e93152...
[3] https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b43108
Some more data, in case anyone can help me figure this out...
When running multiple app servers the request for `load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector` often was very long (10-30 seconds). So I did several requests for just this (not as part of a page request) on both single-app-server and double-app-server configurations and found no difference. The first request on each took a long time (assuming building cache), but subsequent requests all took an okay time (~1.7 seconds...not great but consistent across my setups). So it doesn't appear to be something about that particular request, but perhaps to do with the dynamics of multiple requests occurring? Scott Ananian mentioned that it may be a lock issue, but I haven't been able to find any info on this.
Next I tried just loading the Main Page in both 1-app and 2-app setups, and pulled from the Apache logs the request time, whether it was app server 1 or 2, and the requested URL, and sorted it alphabetically (to make all the requests line up). A graph of the times is at [1]. Raw data can be found at [2]. It clearly shows that 5 of the 11 requests are significantly slower with 2 app servers.
Can anyone help me with what might be going wrong, or how I could troubleshoot this?
Thanks, James
[1] https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b431 08/raw/89e567e238696c3cfaa3fb5ff1d987fda4d9f24c/Comparison-of-Main-Page.png
[2] https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b431 08#file-comparison-of-main-page-md
On Mon, Jun 12, 2017 at 7:40 PM, James Montalvo jamesmontalvo3@gmail.com wrote:
Follow on to previous email chain improperly named "Setting up multiple Parsoid servers behind load balancer".
I'm getting much slower response times in a setup with multiple app servers behind an HAProxy load balancer, versus the same setup with just a single app server behind the same load balancer. I've setup profiling per recommendations from this mailing list. [1] is the call graph of a particularly long request. [2] is a graph showing requests over many page loads, with the better-performing yellow dots/line being the single app server. The worst-performing color is with profiling turned on.
This gist [3] has my LocalSettings.php from both app servers and the included Extensions.php.
Can anyone help me figure this out? Anything else I can provide or certain things I should test?
Thanks, James
[1] https://gist.githubusercontent.com/jamesmontalvo3/ 5adf207623454c9eff98e93152b43108/raw/66612b7aac4fc3aee6287a64bfe056 6b30dc1e87/call-graph.png
[2] https://gist.githubusercontent.com/jamesmontalvo3/ 5adf207623454c9eff98e93152b43108/raw/66612b7aac4fc3aee6287a64bfe056 6b30dc1e87/graph-of-response-times.png
[3] https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e93152b431 08
I still haven't solved this issue and I'm really not sure where to go from here. In order to make it easy for anyone to replicate the issue firsthand, I've setup Vagrant to create two VMs and install/configure everything. To do so, perform the following:
``` # Get the repo, using branch with Vagrant support git clone -b 2vagrant https://github.com/enterprisemediawiki/meza.git cd meza
# Copy config file from default; edit to uncomment "app2" portion. # Also increase RAM and CPUs as desired. cp vagrantconf.default.yml vagrantconf.yml $EDITOR vagrantconf.yml
# Setup the boxes (takes a few minutes) vagrant up
# SSH into the primary box vagrant ssh
# Install everything. Takes 20-40 minutes. sudo meza deploy vagrant ```
If anything fails during deploy (sometimes GitHub hangs up, or other intermittent errors) just rerun `sudo meza deploy vagrant`.
Once installed, you can go to https://192.168.56.56/demo to access a wiki (which is very slow in this multi-app config). FYI, the slow response times make Visual Editor non-functional. User:Admin has password "adminpass". Go to http://192.168.56.56:8088 to access the XHGui profiler UI.
To stop using the second app server (and get better response times), edit the inventory file: `sudo vim /opt/conf-meza/secret/vagrant/hosts`
And remove `192.168.56.57` from the `app-servers` section. Then re-run deploy: `sudo meza deploy vagrant --skip-tags latest`
In case I have log and config files in unfamiliar places, see below.
* Apache: /etc/httpd/conf/httpd.conf, /var/log/httpd/access_log, /var/log/httpd/error_log * PHP: /etc/php.ini, /opt/data-meza/logs/php_errors.log * Parsoid: /etc/parsoid/server.js.log, /etc/parsoid/localsettings.js * HAProxy: /etc/haproxy/haproxy.cfg, /var/log/haproxy.log * MariaDB: /opt/data-meza/mariadb/, /etc/my.cnf
Any help pointing me in the right direction would be truly appreciated. I'll try to be on #wikimedia-tech throughout the day today.
Thanks in advance! --James
On Tue, Jun 13, 2017 at 3:46 PM, James Montalvo jamesmontalvo3@gmail.com wrote:
Some more data, in case anyone can help me figure this out...
When running multiple app servers the request for `load.php?debug=false&lang=en&modules=startup&only=scripts&skin=vector` often was very long (10-30 seconds). So I did several requests for just this (not as part of a page request) on both single-app-server and double-app-server configurations and found no difference. The first request on each took a long time (assuming building cache), but subsequent requests all took an okay time (~1.7 seconds...not great but consistent across my setups). So it doesn't appear to be something about that particular request, but perhaps to do with the dynamics of multiple requests occurring? Scott Ananian mentioned that it may be a lock issue, but I haven't been able to find any info on this.
Next I tried just loading the Main Page in both 1-app and 2-app setups, and pulled from the Apache logs the request time, whether it was app server 1 or 2, and the requested URL, and sorted it alphabetically (to make all the requests line up). A graph of the times is at [1]. Raw data can be found at [2]. It clearly shows that 5 of the 11 requests are significantly slower with 2 app servers.
Can anyone help me with what might be going wrong, or how I could troubleshoot this?
Thanks, James
[1] https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e 93152b43108/raw/89e567e238696c3cfaa3fb5ff1d987fda4d9f24c/ Comparison-of-Main-Page.png
[2] https://gist.github.com/jamesmontalvo3/5adf207623454c9eff98e 93152b43108#file-comparison-of-main-page-md
On Mon, Jun 12, 2017 at 7:40 PM, James Montalvo jamesmontalvo3@gmail.com wrote:
Follow on to previous email chain improperly named "Setting up multiple Parsoid servers behind load balancer".
I'm getting much slower response times in a setup with multiple app servers behind an HAProxy load balancer, versus the same setup with just a single app server behind the same load balancer. I've setup profiling per recommendations from this mailing list. [1] is the call graph of a particularly long request. [2] is a graph showing requests over many page loads, with the better-performing yellow dots/line being the single app server. The worst-performing color is with profiling turned on.
This gist [3] has my LocalSettings.php from both app servers and the included Extensions.php.
Can anyone help me figure this out? Anything else I can provide or certain things I should test?
Thanks, James
[1] https://gist.githubusercontent.com/jamesmontalvo3/5adf20 7623454c9eff98e93152b43108/raw/66612b7aac4fc3aee6287a64bf e0566b30dc1e87/call-graph.png
[2] https://gist.githubusercontent.com/jamesmontalvo3/5adf20 7623454c9eff98e93152b43108/raw/66612b7aac4fc3aee6287a64bf e0566b30dc1e87/graph-of-response-times.png
[3] https://gist.github.com/jamesmontalvo3/5adf207623454c9ef f98e93152b43108
wikitech-l@lists.wikimedia.org