Another note if that doesn't happen to work for you: We discovered the source of our issue by enabling profiling/debugging on the wiki (on a non-public server/install so you can control the page loads and profiling outputs). You can see pretty quickly what areas/code are taking the longest and begin to dig down. Eventually I added custom profile sections to further narrow down the issue to a single "open" call.
On 18 November 2015 at 15:57, Justin Lloyd jlloyd.wiki@gmail.com wrote:
Intriguing! I'll definitely investigate this and report back. Thanks! :)
Justin
On Wed, Nov 18, 2015 at 12:55 PM, Dave Humphrey dave@uesp.net wrote:
That actually sounds very close to an issue we had after upgrading to
1.22
earlier this year. Pages with a lot of images/thumbnails took a long time to render (100s of images took over a minute). We eventually tracked it down to having the default $wgTmpDirectory pointing to the upload/images directory which was on a NFS share. Each file creation (or access?) on a NFS share takes a fixed 50ms so you multiply that by multiple accesses
and
you get the delay.
We fixed it by simply changing $wgTmpDirectory to point to a path on the local fixed drive. Since your setup sounds similar to ours it may be
worth
trying it out. If this is indeed your issue you can force a "slow" page load by purging a page with a lot of images on it. Test it before and
after
the change.
On 18 November 2015 at 15:42, Justin Lloyd jlloyd.wiki@gmail.com
wrote:
My speculation is that it's image heavy pages, not one specific php
page.
This is for the Guild Wars 2 wikis, specifically the English wiki at wiki.guildwars2.com. The Game Updates page used to be problematic,
causing
a massive backlog because a game update or hotfix was released and
people
hammered that page to see the list of changes. Our main editors changed
how
the page works, primarily breaking it up into subpages that DPL
integrates
the most recent of which into the main page, but also changing the templates that were used for displaying trait and skill icons.
Further analysis of the Apache logs, after adding the %D field to the
log
format, showed a lot of pages taking sometimes minutes to complete,
which
ultimately result in 502s. The ones that appear to take the longest are those with a lot of these thumbnail images, which is why I think it's
still
a template issue, but it would be really nice to be able to back up
that
hypothesis with actual data from process diagnostics, stack traces,
etc.
(I really miss DTrace on Solaris. I know it exists for Linux but I'm
wary
of trying it, especially on production systems. Anyone here have
experience
with it?)
On Wed, Nov 18, 2015 at 12:25 PM, Dave Humphrey dave@uesp.net wrote:
My usual strategy is to check server-status and if I need more detail
go
with debugging tools (gdp etc..., see
http://serverfault.com/questions/487530/find-out-what-high-cpu-usage-apache-...
). It seems you have done this, however, and I'm wondering why you
haven't
at
least been able to narrow down the issue? You should at least be able
to
know which PHP file is locking up/crashing or the rough area/cause?
Once you know roughly where it is you can add temporary PHP logging commands in the code to help narrow down the issue further. If you
also
know roughly where/how the lockups are you can try
testing/replicating
the
behavior to get a bit more control on it.
On 18 November 2015 at 14:59, Justin Lloyd jlloyd.wiki@gmail.com
wrote:
Hey everyone,
Yesterday I posted this to /r/mediawiki (https://redd.it/3t2apu)
and
cross-posted to /r/apache as well, but unfortunately I've still not received any feedback other than the one request here for
clarification
and
a couple of suggestions on reddit that I'd already covered in the
post.
It's possible no one has any suggestions for me regarding this
issue
(it
is
a somewhat complex application stack that could be requiring
configuration
and/or tuning in multiple places, for example), but given how
severe
of a
problem this is for my production sites, I wanted to bump it once
in
hopes
of possibly getting at least some pointers of things to consider
that I
may
not have already, especially with respect to diagnostics I could
perform
on
the live web servers beyond just server-status and the collectd
apache
plugin (which is basically the same thing), for example.
On Thu, Nov 12, 2015 at 8:02 AM, Justin Lloyd <
jlloyd.wiki@gmail.com
wrote:
Marcin,
It's the biggest and most heavily trafficked of our wikis because
its
the
English-language version of the wiki. We also have German,
French,
and
Spanish, but the English-speaking community is by far the largest
and
most
active. There are some tiny configuration differences between the
wikis
(e.g. the value of $wgJobRunRate, the specific extensions loaded)
but
nothing very significant I don't believe.
I should also add that all four of these wikis (we have a 5th,
for
7
total, not 6 as I'd originally said) also use Semantic MediaWiki extensively. I believe the other three wikis would run into the
same
problem if they had same amount of traffic as the English one.
However,
since they all are vhosts within the same Apache instances, the
English
one's problems affect all of them.
Justin
On Thu, Nov 12, 2015 at 1:42 AM, Marcin Cieslak <
saper@saper.info>
wrote:
> On 2015-11-12, Justin Lloyd jlloyd.wiki@gmail.com wrote: > > * Six wikis are configured as Vhosts in Apache, load balanced
by a
> separate > > set of front-end servers, where two of the wikis are for
private
> internal > > use and the other four are public, though the traffic to one
of
the
> public > > wikis dwarfs the rest and it's the wiki giving me problems. > > (...) > > > I'm mainly looking right now for how to troubleshoot the stuck > processes, > > but any advice regarding this architecture is also welcome,
as I
feel
it
> > could use some improvement but I'm not sure how just yet. > > The question that immediately comes to my mind before I start
digging
> any further - how is the wiki making problems special? Is it
just
getting
> most of the traffic (it is the "most interesting" one) or is its > configuration slightly different? > > Marcin Cieślak > https://www.mediawiki.org/wiki/User:Saper > > > _______________________________________________ > MediaWiki-l mailing list > To unsubscribe, go to: > https://lists.wikimedia.org/mailman/listinfo/mediawiki-l >
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
-- Dave Humphrey -- dave@uesp.net Founder/Server Admin of the Unofficial Elder Scrolls Pages --
www.uesp.net
www.viud.net - Building the world's toughest USB drive _______________________________________________ MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
-- Dave Humphrey -- dave@uesp.net Founder/Server Admin of the Unofficial Elder Scrolls Pages --
www.uesp.net
www.viud.net - Building the world's toughest USB drive _______________________________________________ MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l