Hi.
There are over a dozen reports of general slowness with the English Wikipedia: https://en.wikipedia.org/w/index.php?oldid=483631204#is_it_me_or_is_wiki_ver y_slow.3F
As I browsed and edited the site today, I experienced similar issues (painful slowness on intermittent page loads, loading half a page and then stalling, etc.).
Is anyone from ops or engineering looking into this? I've filed a bug about this as well: https://bugzilla.wikimedia.org/show_bug.cgi?id=35448.
It may be that this is a known issue (due to schema changes or whatever else), but I don't think anyone from ops or engineering has commented on the village pump (or elsewhere?). Can someone please take a look at what's going on?
MZMcBride
It looks like the problem started 3/21, see attached image. I'm about to head to the airport for ~20 hours of travel so can't investigate further, but may be related to parser changes made that day which line up with the second bigger spike.
- 05:23 logmsgbot: tstarling synchronized php-1.19/includes/parser/CoreParserFunctions.php - 05:23 logmsgbot: tstarling synchronized php-1.19/includes/parser/Parser.php - 05:23 logmsgbot: tstarling synchronized php-1.19/includes/parser/StripState.php - 05:22 logmsgbot: tstarling synchronized php-1.19/tests/parser/parserTests.txt
There was also a the release of a majorly changed MobileFrontend earlier that day / prior night that line up with the first spike. The MF rewrite doesn't perform well - MobileFrontend::DOMParse avg time went from 15ms to 150ms (~500ms at 99th) and it wouldn't be impossible for it to also impact non-mobile performance.
DB, ES, and memcache latency look steady, so I suspect it's all application side.
-Asher
On Fri, Mar 23, 2012 at 10:00 PM, MZMcBride z@mzmcbride.com wrote:
Hi.
There are over a dozen reports of general slowness with the English Wikipedia:
https://en.wikipedia.org/w/index.php?oldid=483631204#is_it_me_or_is_wiki_ver y_slow.3F
As I browsed and edited the site today, I experienced similar issues (painful slowness on intermittent page loads, loading half a page and then stalling, etc.).
Is anyone from ops or engineering looking into this? I've filed a bug about this as well: https://bugzilla.wikimedia.org/show_bug.cgi?id=35448.
It may be that this is a known issue (due to schema changes or whatever else), but I don't think anyone from ops or engineering has commented on the village pump (or elsewhere?). Can someone please take a look at what's going on?
MZMcBride
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 24.03.2012, 20:16 Asher wrote:
There was also a the release of a majorly changed MobileFrontend earlier that day / prior night that line up with the first spike. The MF rewrite doesn't perform well - MobileFrontend::DOMParse avg time went from 15ms to 150ms (~500ms at 99th) and it wouldn't be impossible for it to also impact non-mobile performance.
DB, ES, and memcache latency look steady, so I suspect it's all application side.
Thanks for the heads up, Asher. I've committed thorough profiling in https://gerrit.wikimedia.org/r/3696, would appreciate if someone deployed it.
On Sat, Mar 24, 2012 at 12:15 PM, Max Semenik maxsem.wiki@gmail.com wrote:
On 24.03.2012, 20:16 Asher wrote:
There was also a the release of a majorly changed MobileFrontend earlier that day / prior night that line up with the first spike. The MF rewrite doesn't perform well - MobileFrontend::DOMParse avg time went from 15ms to 150ms (~500ms at 99th) and it wouldn't be impossible for it to also impact non-mobile performance.
DB, ES, and memcache latency look steady, so I suspect it's all application side.
Thanks for the heads up, Asher. I've committed thorough profiling in https://gerrit.wikimedia.org/r/3696, would appreciate if someone deployed it.
I've reviewed and merged your profiling code. We need to create a patch against the deployed version at: https://svn.wikimedia.org/viewvc/mediawiki/branches/wmf/1.19wmf1/extensions/...
Once, that patch is created I'll apply it and push it live to production. This should give us better insight into why this slowdown in performance is occurring.
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Sat, Mar 24, 2012 at 12:33 PM, Patrick Reilly preilly@wikimedia.org wrote:
On Sat, Mar 24, 2012 at 12:15 PM, Max Semenik maxsem.wiki@gmail.com wrote:
On 24.03.2012, 20:16 Asher wrote:
There was also a the release of a majorly changed MobileFrontend earlier that day / prior night that line up with the first spike. The MF rewrite doesn't perform well - MobileFrontend::DOMParse avg time went from 15ms to 150ms (~500ms at 99th) and it wouldn't be impossible for it to also impact non-mobile performance.
DB, ES, and memcache latency look steady, so I suspect it's all application side.
Thanks for the heads up, Asher. I've committed thorough profiling in https://gerrit.wikimedia.org/r/3696, would appreciate if someone deployed it.
I've reviewed and merged your profiling code. We need to create a patch against the deployed version at: https://svn.wikimedia.org/viewvc/mediawiki/branches/wmf/1.19wmf1/extensions/...
Once, that patch is created I'll apply it and push it live to production. This should give us better insight into why this slowdown in performance is occurring.
Okay, this revision: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/114477 was pushed live at 12:46pm PDT.
So, we should be able to see a clear graph in about 30 minutes.
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 24.03.2012, 23:51 Patrick wrote:
On Sat, Mar 24, 2012 at 12:33 PM, Patrick Reilly preilly@wikimedia.org wrote:
On Sat, Mar 24, 2012 at 12:15 PM, Max Semenik maxsem.wiki@gmail.com wrote:
On 24.03.2012, 20:16 Asher wrote:
There was also a the release of a majorly changed MobileFrontend earlier that day / prior night that line up with the first spike. The MF rewrite doesn't perform well - MobileFrontend::DOMParse avg time went from 15ms to 150ms (~500ms at 99th) and it wouldn't be impossible for it to also impact non-mobile performance.
DB, ES, and memcache latency look steady, so I suspect it's all application side.
Thanks for the heads up, Asher. I've committed thorough profiling in https://gerrit.wikimedia.org/r/3696, would appreciate if someone deployed it.
I've reviewed and merged your profiling code. We need to create a patch against the deployed version at: https://svn.wikimedia.org/viewvc/mediawiki/branches/wmf/1.19wmf1/extensions/...
Once, that patch is created I'll apply it and push it live to production. This should give us better insight into why this slowdown in performance is occurring.
Okay, this revision: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/114477 was pushed live at 12:46pm PDT.
So, we should be able to see a clear graph in about 30 minutes.
And these graphs don't make any sense: https://graphite.wikimedia.org/dashboard/MobileFrontend-DOMParse the whole function's execution time is much larger than sum of its pieces. What's going on?
On 24/03/12 21:36, Max Semenik wrote:
And these graphs don't make any sense: https://graphite.wikimedia.org/dashboard/MobileFrontend-DOMParse the whole function's execution time is much larger than sum of its pieces. What's going on?
Maybe it's a recursive function? I would have suspected that the parser fixes affected some common extension making it slower. Ganglia output shows that CPU load decreased, though. From a steady load over 10k it became much more flaky, dropping to 9k (due to full operations being slower?).
Is the revision hashing still running? If there are processes fetching many old revisions from esternal storage it might be the hurting the caches (such as taking "fresh" parsed pages out of memcached).
Another idea would be related to db24 failure, if some hosts failed the syncing of db.php and still pointed to it.
On 25.03.2012, 2:28 Platonides wrote:
On 24/03/12 21:36, Max Semenik wrote:
And these graphs don't make any sense: https://graphite.wikimedia.org/dashboard/MobileFrontend-DOMParse the whole function's execution time is much larger than sum of its pieces. What's going on?
Maybe it's a recursive function?
No, it's not - unless there's some crazy PHP quirk causing output handler to be called several times.
I would have suspected that the parser fixes affected some common extension making it slower. Ganglia output shows that CPU load decreased, though. From a steady load over 10k it became much more flaky, dropping to 9k (due to full operations being slower?).
Is the revision hashing still running? If there are processes fetching many old revisions from esternal storage it might be the hurting the caches (such as taking "fresh" parsed pages out of memcached).
Parser cache is in MySQL, you can't displace anything out of it by putting stuff into memcached.
Another idea would be related to db24 failure, if some hosts failed the syncing of db.php and still pointed to it.
This appears to have been due to a network link issue. One of the uplinks from our row A switch in eqiad was spewing out input framing errors, causing some packets to be dropped.
This link has been taken down for troubleshooting and the issue appears to be gone. Please let us know ASAP if you still are seeing any slowness.
Thanks Leslie
On Sat, Mar 24, 2012 at 4:22 PM, Max Semenik maxsem.wiki@gmail.com wrote:
On 25.03.2012, 2:28 Platonides wrote:
On 24/03/12 21:36, Max Semenik wrote:
And these graphs don't make any sense: https://graphite.wikimedia.org/dashboard/MobileFrontend-DOMParse the whole function's execution time is much larger than sum of its pieces. What's going on?
Maybe it's a recursive function?
No, it's not - unless there's some crazy PHP quirk causing output handler to be called several times.
I would have suspected that the parser fixes affected some common extension making it slower. Ganglia output shows that CPU load decreased, though. From a steady load over 10k it became much more flaky, dropping to 9k (due to full operations being slower?).
Is the revision hashing still running? If there are processes fetching many old revisions from esternal storage it might be the hurting the caches (such as taking "fresh" parsed pages out of memcached).
Parser cache is in MySQL, you can't displace anything out of it by putting stuff into memcached.
Another idea would be related to db24 failure, if some hosts failed the syncing of db.php and still pointed to it.
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Max Semenik wrote:
Parser cache is in MySQL, you can't displace anything out of it by putting stuff into memcached.
You're wrong here. Although there's usually plenty of space, so this isn't somehting to be worried about.
(I would link to a graphic of memcached servers, but I'm unable to find a ganglia entry for them)
wikitech-l@lists.wikimedia.org