So I confirmed that $wgTmpDirectory is defaulting to /tmp on my systems, so
that's not the problem. As for profiling, I enabled it on one of the four
web servers and that appears to actually trigger the problem, or a very
similar one. The Apache processes quickly climbed to their MaxClients limit
of 100 and just stayed there, forcing me to restart Apache after first
commenting out the profiling settings in LocalSettings.php, where
$wgProfileLimit was set to 2.
On Wed, Nov 18, 2015 at 1:00 PM, Dave Humphrey <dave(a)uesp.net> wrote:
Another note if that doesn't happen to work for
you: We discovered the
source of our issue by enabling profiling/debugging on the wiki (on a
non-public server/install so you can control the page loads and profiling
outputs). You can see pretty quickly what areas/code are taking the longest
and begin to dig down. Eventually I added custom profile sections to
further narrow down the issue to a single "open" call.
On 18 November 2015 at 15:57, Justin Lloyd <jlloyd.wiki(a)gmail.com> wrote:
Intriguing! I'll definitely investigate this
and report back. Thanks! :)
Justin
On Wed, Nov 18, 2015 at 12:55 PM, Dave Humphrey <dave(a)uesp.net> wrote:
That actually sounds very close to an issue we
had after upgrading to
1.22
> earlier this year. Pages with a lot of images/thumbnails took a long
time
> to render (100s of images took over a
minute). We eventually tracked it
> down to having the default $wgTmpDirectory pointing to the
upload/images
> directory which was on a NFS share. Each
file creation (or access?) on
a
NFS share
takes a fixed 50ms so you multiply that by multiple accesses
and
> you get the delay.
>
> We fixed it by simply changing $wgTmpDirectory to point to a path on
the
local
fixed drive. Since your setup sounds similar to ours it may be
worth
trying it out. If this is indeed your issue you
can force a "slow" page
load by purging a page with a lot of images on it. Test it before and
after
the change.
On 18 November 2015 at 15:42, Justin Lloyd <jlloyd.wiki(a)gmail.com>
wrote:
> My speculation is that it's image heavy pages, not one specific php
page.
This is
for the Guild Wars 2 wikis, specifically the English wiki at
wiki.guildwars2.com. The Game Updates page used to be problematic,
causing
> a massive backlog because a game update or hotfix was released and
people
> > hammered that page to see the list of changes. Our main editors
changed
how
the page works, primarily breaking it up into
subpages that DPL
integrates
> the most recent of which into the main page, but also changing the
> templates that were used for displaying trait and skill icons.
>
> Further analysis of the Apache logs, after adding the %D field to the
log
> format, showed a lot of pages taking
sometimes minutes to complete,
which
> > ultimately result in 502s. The ones that appear to take the longest
are
those with a lot of these thumbnail images, which is
why I think it's
still
> a template issue, but it would be really nice to be able to back up
that
> hypothesis with actual data from process
diagnostics, stack traces,
etc.
>
> (I really miss DTrace on Solaris. I know it exists for Linux but I'm
wary
> > of trying it, especially on production systems. Anyone here have
> experience
> > with it?)
> >
> >
> > On Wed, Nov 18, 2015 at 12:25 PM, Dave Humphrey <dave(a)uesp.net>
wrote:
> >
> > > My usual strategy is to check server-status and if I need more
detail
go
> with debugging tools (gdp etc..., see
>
>
http://serverfault.com/questions/487530/find-out-what-high-cpu-usage-apache…
> >
).
> > It seems you have done this, however, and I'm wondering why you
haven't
> > at
> > > least been able to narrow down the issue? You should at least be
able
to
> > know which PHP file is locking up/crashing or the rough area/cause?
> >
> > Once you know roughly where it is you can add temporary PHP logging
> > commands in the code to help narrow down the issue further. If you
also
> > know roughly where/how the lockups are
you can try
testing/replicating
> the
> > behavior to get a bit more control on it.
> >
> > On 18 November 2015 at 14:59, Justin Lloyd <jlloyd.wiki(a)gmail.com>
> wrote:
> >
> > > Hey everyone,
> > >
> > > Yesterday I posted this to /r/mediawiki (
https://redd.it/3t2apu)
and
> > > > cross-posted to /r/apache as well, but unfortunately I've still
not
> > received any feedback other than the one
request here for
clarification
> and
> > a couple of suggestions on reddit that I'd already covered in the
post.
> > >
> > > It's possible no one has any suggestions for me regarding this
issue
> (it
> > is
> > > a somewhat complex application stack that could be requiring
> > configuration
> > > and/or tuning in multiple places, for example), but given how
severe
> of a
> > > problem this is for my production sites, I wanted to bump it once
in
>
hopes
> > of possibly getting at least some pointers of things to consider
that I
> may
> > not have already, especially with respect to diagnostics I could
perform
> on
> > the live web servers beyond just server-status and the collectd
apache
> > > plugin (which is basically the same thing), for example.
> > >
> > >
> > >
> > > On Thu, Nov 12, 2015 at 8:02 AM, Justin Lloyd <
jlloyd.wiki(a)gmail.com
> >
> > > > wrote:
> > > >
> > > > > Marcin,
> > > > >
> > > > > It's the biggest and most heavily trafficked of our wikis
because
its
> > the
> > > > English-language version of the wiki. We also have German,
French,
> > and
> > > > > Spanish, but the English-speaking community is by far the
largest
> and
> > > > most
> > > > > active. There are some tiny configuration differences between
the
> > wikis
> > > > > (e.g. the value of $wgJobRunRate, the specific extensions
loaded)
but
> > > > nothing very significant I don't believe.
> > > >
> > > > I should also add that all four of these wikis (we have a 5th,
for
> 7
> > > > > total, not 6 as I'd originally said) also use Semantic
MediaWiki
> > > extensively. I believe the other three
wikis would run into the
same
> > > > problem if they had same amount of traffic as the English one.
> However,
> > > > since they all are vhosts within the same Apache instances, the
> English
> > > > one's problems affect all of them.
> > > >
> > > > Justin
> > > >
> > > >
> > > > On Thu, Nov 12, 2015 at 1:42 AM, Marcin Cieslak <
saper(a)saper.info>
> > > > wrote:
> > > > >
> > > > >> On 2015-11-12, Justin Lloyd <jlloyd.wiki(a)gmail.com>
wrote:
> > > > >> > * Six wikis are configured as Vhosts in Apache, load
balanced
by a
> > > >> separate
> > > >> > set of front-end servers, where two of the wikis are for
private
> > > >> internal
> > > >> > use and the other four are public, though the traffic to one
of
> > the
> > > > >> public
> > > > >> > wikis dwarfs the rest and it's the wiki giving me
problems.
> > > > >>
> > > > >> (...)
> > > > >>
> > > > >> > I'm mainly looking right now for how to
troubleshoot the
stuck
> >
> >> processes,
> > > >> > but any advice regarding this architecture is also welcome,
as I
> > feel
> > > it
> > > >> > could use some improvement but I'm not sure how just
yet.
> > > >>
> > > >> The question that immediately comes to my mind before I start
> digging
> > > >> any further - how is the wiki making problems special? Is it
just
> > > > getting
> > > > >> most of the traffic (it is the "most interesting"
one) or is
its
> > >> configuration slightly different?
> > >>
> > >> Marcin Cieślak
> > >>
https://www.mediawiki.org/wiki/User:Saper
> > >>
> > >>
> > >> _______________________________________________
> > >> MediaWiki-l mailing list
> > >> To unsubscribe, go to:
> > >>
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> > >>
> > >
> > >
> > _______________________________________________
> > MediaWiki-l mailing list
> > To unsubscribe, go to:
> >
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
> >
>
>
>
> --
> Dave Humphrey -- dave(a)uesp.net
> Founder/Server Admin of the Unofficial Elder Scrolls Pages --
www.uesp.net
www.viud.net - Building the world's toughest
USB drive
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
--
Dave Humphrey -- dave(a)uesp.net
Founder/Server Admin of the Unofficial Elder Scrolls Pages --
www.uesp.net
www.viud.net - Building the world's toughest
USB drive
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
--
Dave Humphrey -- dave(a)uesp.net
Founder/Server Admin of the Unofficial Elder Scrolls Pages --
www.uesp.net
www.viud.net - Building the world's toughest USB drive
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l