Hi again,
_in short_:
* did an installation of MediaWiki 1.9.x a few weeks ago; * MediaWiki was working fine as expected and as used for several weeks; * MediaWiki suddenly stopped working partly and became mostly inaccessible a feew days ago; * Currently, it started to interfere with other services runnin on the same host (requesting a page from Media Wiki kills apache2) * can't find any helpful information in ../syslog, ../messages, the apache2's error.log or access.log; enabled php.log (in php.ini) and debug.log (in LocalSettings.php): the first logs nothing helpful, also, the latter is not being created at all.
_in full_:
A few months ago, I installed MediaWiki 1.9, created some hundred nodes and all appeared to work like a charm: Creating articles, deleting, moving, or editing them, all reasonably fast and relaible.
Now, after several weeks of operation, the installation has partially ceased to work: Entering an existing term in the Search box has no effect - the browser claims to request the page, but nothing happens (nothing = no page showing up, and no timeout appearing after several minutes [hours]). Sometimes - but not reproducable - the requested page appears, when I directly enter the page's full URL in the browser's location bar. Creating new pages seems no longer to work at all. I haven't changed the configuration in LocalSettings.php for weeks, and I haven't noticed having changed anything else on the server (running under Debian "Etch" since it was released).
However, when I do a "tail -f" on the webserver's logfiles, requests from spiders and external browsers are hushing by with a pretty high frequency (several files requested per second). What appears *really* strange to me is, that requests for pages from my own browser (Opera 9.2.1/Win) don't appear in access.log, as long as I use Opera, where I'm logged in to the Wiki; when I switch to Firefox (2.0.0.4/Win, not logged in to the Wiki), those requests start to appear in access.log.
Sticking with Firefox, entering page names of existing articles results in delivering the article and showing up in access.log; however, when entering non-existing article names (in order to create them), those requests don't appear in access.log, and MediaWiki delivers no page (search result etc.), like with Opera. It's also mostly the same result, if I enter a new term in the Search box or directly via the URL location (.../w/index.php?title=New_Article&action=edit). Directly after rebooting the server, sometimes creating new articles does work again when directly entering the new article's URL, but only once.
Another LAMP application on the same host (Drupal) is working fine (more or less, and very slow, but however it's still working), so basically Apache, MySQL, and PHP seem to work.
The server hosts three domains, for each one exists one MediaWiki installation; two of them were set up completely and checked for correct operation a few weeks ago, but they were not in use yet. These installations stopped working also and show similar behaviour to those described below.
Somehow requests to MediaWiki started a few days ago to interfere with the other services operation; e.g., when I'm requesting a page from MediaWiki (e.g. Front Page) and concurrently try to save an article in Drupal, the server does neither deliver the FrontPage from MediaWiki, nor save the article in Drupal (I'm waiting for approx. 10 minutes and usually cancel then the operation by pressing "Esc" in the browser). At this moment, all apache2 proecess dissapear from top/htop and no other requests to either Drupal or MediaWiki are served in any way. As it seeems, MediaWiki started for an unknown reason to kill apache2 in the setup im running.
Drupal starts responding again as soon as I do an "/etc/init.d/apache2 restart". Sometimes even this fails:
# /etc/init.d/apache2 restart Forcing reload of web server (apache2)...(98)Address already in use: make_sock: could not bind to address 0.0.0.0:80 no listening sockets available, shutting down Unable to open logs
Apache2 seems to die completely:
# /etc/init.d/apache2 stop Stopping web server (apache2)...httpd (no pid file) not running
It then can be restarted;
# /etc/init.d/apache2 start Starting web server (apache2)....
After this, normal operation continues, as long as I don't request any page from MediaWiki. If I request the FrontPage of any of those three setupts, this repeats all over again.
I tried to check the MediaWiki databases with phpMyAdmin (2.9.1.1-Debian-3); even opening them takes over two minutes, checking single small tables like "de_watchlist" results in an "OK", checking large tables like "de_text" or even all at once does not finish after several hours (also without phpMyAdmin giving a timeout). During this "check" operations, CPU load according to "htop" fluctuates between 16 and 97% (most of the time it's around or below 50%)).
This is a pretty bizarre scenery: In the background of my desktop is a shell, where requests logged to /var/log/apache2/access.log continuously are floating by, on the same time being unable to load or even edit a single page on my own Wiki... ;-/
/var/log/syslog and ../messages don't show anything unusual.
In LocalSetting.php, I enabled
$wgShowExceptionDetails = true; $wgDebugLogFile = "debug.log";
However, a "debug.log" is not being created in the wiki directory (or according to "updatedb; locate" anywhere else on the server).
Also, the php error log is pretty quiet; after doing an "apache2 restart", it reports once:
-- snip --
# tail -f php.log [29-Jun-2007 01:39:12] PHP Warning: Module 'gd' already loaded in Unknown on line 0 [29-Jun-2007 01:39:12] PHP Warning: Module 'mysql' already loaded in Unknown on line 0
-- snip --
When entering some expression in the search box, php.log logs nothing else (at least not for several minutes). While waiting for something to happen, requests from external hosts are rushing by, also not causing php errors.
/etc/php5/apache2/php.ini (I hope that's the right one) is set to:
... error_reporting = E_ALL & ~E_NOTICE display_errors = On display_startup_errors = Off log_errors = On ignore_repeated_source = Off report_memleaks = On ...
Any hints where and how I could look what's going wrong?
Thanks & regards, -asb
Agon S. Buchholz wrote:
Somehow requests to MediaWiki started a few days ago to interfere with the other services operation; e.g., when I'm requesting a page from MediaWiki (e.g. Front Page) and concurrently try to save an article in Drupal, the server does neither deliver the FrontPage from MediaWiki, nor save the article in Drupal (I'm waiting for approx. 10 minutes and usually cancel then the operation by pressing "Esc" in the browser). At this moment, all apache2 proecess dissapear from top/htop and no other requests to either Drupal or MediaWiki are served in any way. As it seeems, MediaWiki started for an unknown reason to kill apache2 in the setup im running.
It sounds like you're running a threaded MPM and PHP is segfaulting. To avoid having MediaWiki interfere with other apps, use the prefork MPM whenever you are using PHP. This is strongly recommended by the PHP manual.
I believe I have written a post or two in the past about debugging and fixing segfaults. I'll repeat some basic principles here in brief.
The first thing to try is the voodoo magic method: change versions of things randomly until it starts working. Disable suspicious PHP extensions. Upgrade or downgrade PHP. Upgrade MediaWiki. Because obvious, reproducible segfaults are usually fixed in PHP or worked around in MediaWiki, it's the rare ones that people usually see, which arise from special combinations of buggy software.
If you're a programmer, then tracing down MediaWiki's call tree to find the segfaulting function, or examining the backtrace in gdb, may produce results.
-- Tim Starling
Tim Starling wrote:
It sounds like you're running a threaded MPM and PHP is segfaulting. To avoid having MediaWiki interfere with other apps, use the prefork MPM whenever you are using PHP. This is strongly recommended by the PHP manual.
I'm running Apache2 as Prefork MPM [1]; in Debian, that is done automagically when installing PHP. At least it is supposed to do so according to the documentation and to my package database.
I believe I have written a post or two in the past about debugging and fixing segfaults. I'll repeat some basic principles here in brief.
Is there any way to determine, if httpd (or whatever) really *is* segfaulting?
At least in "mytop" I can see that sql statements seem to be processed; somehow they have to get there.
If I enter a nonexisting article name in the search box, in "mytop" something like this appears:
Query SELECT page_id, page_namespace, page_title FROM `page`,`searchindex` WHERE page_id=si_page AND [...]
The rest of the sql statement i can not read (it doesn't do line breaks). However, this appears like the beginning of an valid statement, even if it simply stays there "forever".
The first thing to try is the voodoo magic method: change versions of things randomly until it starts working. Disable suspicious PHP extensions. Upgrade or downgrade PHP. Upgrade MediaWiki.
Sorry, most of this is not an option; I'm running Debian "Stable" for some reason and can't break everything else just to play around unsystematically. I'm currently trying to upgrade MediaWiki, but this will take at least several days, even if it succeeds.
What I tried in the last days is to upgrade the hardware; MediaWiki runs now on an Opteron 1212 HE with 2 GB RAM (before it was a Celeron with 512 MB RAM). Even if the configuration is mostly new, the error started to pop up again after a few days.
Should I file a bug about this?
Greetings, -asb
[1] http://packages.debian.org/stable/net/apache2-mpm-prefork, http://packages.debian.org/stable/net/libapache2-mod-php5, http://packages.debian.org/stable/web/php5
Agon S. Buchholz wrote:
Tim Starling wrote:
It sounds like you're running a threaded MPM and PHP is segfaulting. To avoid having MediaWiki interfere with other apps, use the prefork MPM whenever you are using PHP. This is strongly recommended by the PHP manual.
I'm running Apache2 as Prefork MPM [1]; in Debian, that is done automagically when installing PHP. At least it is supposed to do so according to the documentation and to my package database.
I believe I have written a post or two in the past about debugging and fixing segfaults. I'll repeat some basic principles here in brief.
Is there any way to determine, if httpd (or whatever) really *is* segfaulting?
run it on a debugger (likely gdb). It will catch teh segfault and tell you where the segfault is.
At least in "mytop" I can see that sql statements seem to be processed; somehow they have to get there.
If I enter a nonexisting article name in the search box, in "mytop" something like this appears:
Query SELECT page_id, page_namespace, page_title FROM `page`,`searchindex` WHERE page_id=si_page AND [...]
The rest of the sql statement i can not read (it doesn't do line breaks). However, this appears like the beginning of an valid statement, even if it simply stays there "forever".
Read it in the page source.
Agon S. Buchholz wrote:
[ Strange phenomena like: Creating new articles and searching for existing ones causes "deadlocks" without browser timouts or error messages, new articles can only created, if even, by entering the article's name directly in the browser's location bar etc. ]
To close this thread finally, just one closing remark. I'm pretty sure that I've found out what was causing this strange behaviour: It seemed to be caused by an overflowed Job queue. When I checked [[Special:Statistics]], it showed numbers in the range of several hundred thousand jobs, at one site even close to 1,5 mio. jobs.
Even if I don't fully understand what these jobs are, they are related to templates; I'm making heavy use of templates, every article uses at least one, and thus even changing just a couple of templates, embedded in a few hundred pages, seems to multiply this job queue considerably. At least that's what [1] suggestes: "Why the job queue exists: Updating links tables when a template changes".
Thus I ran ./maintenance/runJobs.php for a few days (getting rid of this Jobs definitely takes a lot of time), and now the Job queue is down to 1, and the problems have disappeared completely.
Just for the record: If your MediaWiki doesn't answer as quickly as you are used to, or creating new articles seems to "deadlock", you might want to have a look at the size of your Job queue, especially if you're using lot's of templates.
Regards, -asb
Could someone clarify something, I think I read in the Wikipedia Signpost in BRION that a bug had been fixed in 1.11alpha that meant if there were no jobs the job queue still showed there was 1. One of my wikis shows in Special:Statistics that the job queue is 1. Is this the bug (running 1.10.0) or do I need to do something about this? Thanks.
On 29/07/07, Agon S. Buchholz asb@kefk.net wrote:
Agon S. Buchholz wrote:
[ Strange phenomena like: Creating new articles and searching for existing ones causes "deadlocks" without browser timouts or error messages, new articles can only created, if even, by entering the article's name directly in the browser's location bar etc. ]
To close this thread finally, just one closing remark. I'm pretty sure that I've found out what was causing this strange behaviour: It seemed to be caused by an overflowed Job queue. When I checked [[Special:Statistics]], it showed numbers in the range of several hundred thousand jobs, at one site even close to 1,5 mio. jobs.
Even if I don't fully understand what these jobs are, they are related to templates; I'm making heavy use of templates, every article uses at least one, and thus even changing just a couple of templates, embedded in a few hundred pages, seems to multiply this job queue considerably. At least that's what [1] suggestes: "Why the job queue exists: Updating links tables when a template changes".
Thus I ran ./maintenance/runJobs.php for a few days (getting rid of this Jobs definitely takes a lot of time), and now the Job queue is down to 1, and the problems have disappeared completely.
Just for the record: If your MediaWiki doesn't answer as quickly as you are used to, or creating new articles seems to "deadlock", you might want to have a look at the size of your Job queue, especially if you're using lot's of templates.
Regards, -asb
[1] http://meta.wikimedia.org/wiki/Help:Job_queue
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On 29/07/07, Gary Kirk gary.kirk@gmail.com wrote:
Could someone clarify something, I think I read in the Wikipedia Signpost in BRION that a bug had been fixed in 1.11alpha that meant if there were no jobs the job queue still showed there was 1. One of my wikis shows in Special:Statistics that the job queue is 1. Is this the bug (running 1.10.0) or do I need to do something about this?
This is bug 10228 which was fixed in r23531. The fix hasn't gone into a release at the moment.
Rob Church
mediawiki-l@lists.wikimedia.org