Samuel, the MySQL master server for the s3 cluster encountered some sort
of error a bit after 14:50 UTC today, leading to about a half hour of
intermittently broken access.
The machine is still on the network, but refuses MySQL connections and
drops SSH connections. Before completely losing connectivity to samuel,
I saw many threads in 'opening tables' state in the process list. A disk
error is possible; further diagnosis awaits Rob's next trip into the
data center to fix up samuel and db3.
My first step on discovering there was something quite awry was to put
s3/s3a/default into read-only mode and remove samuel from the server
list, so read-only access could continue during further recovery efforts.
After confirming that the remaining s3 slaves were consistent, I
switched masters to db1 and restored read/write mode.
I encountered a couple of snags during this process. Many apache
processes seemed to be hanging, leading to 'resource unavailable' errors
reported by the squid proxies.
Unfortunately I wasn't able to fully diagnose this while in the middle
of switching masters, but I might suspect timing-out connections to
samuel and/or adler (which has been down for some time, but was still
listed in the s3a group as the next available server) and/or bogus
wait-for-slave delays.
Graceful apache restarts didn't seem to help much, but a forced restart
(killing old processes) seemed to do the job once I'd resolved the
databases themselves.
The s3 databases are currently humming along happily, though with adler
still out we are down to just one slave in the general s3/default pool
(db5). If we lose one more, we'll lose our redundancy and would have to
take the group into read-only to clone another slave when a server
becomes available.
So it would be nice if we can get another slave back online before
losing one more. :)
The s3a subgroup has one additional slave available (webster).
Software issues:
With the setproctitle extension either disabled or undocumented it's
harder now to tell where the stuck processes are stuck. We should have
an equivalent debugging tool available if possible.
There may be an issue with too-long timeouts either on MySQL connections
or slave waits. Should double-check on this.
-- brion vibber (brion @ wikimedia.org)
Hi,
It's been some time since we've expanded our server configuration at
wikiHow. We currently have 5 servers (1 Squid, 3 Apache and 1 DB) and
I have some questions about planning for future growth. We do make use
of memcached and eaccelerator, Squid occasionally reaches 300 requests
per second, and we get about 5.5 million unique visitors a month
growing at about 10% per month. All servers are dual Opteron 64bit,
4GB of RAM, with 3x73GB SCSI drives.
Does anyone know how it's possible to estimate how much available
cushion there is for a given Apache server? Is there an upper limit on
the number of requests per second or server load? Basically, at what
point can you be sure that you need to add more Apache servers?
The ratio of Squid to Apache server should be about 1:10, correct?
Once you add a 2nd Squid, what the best way to load balance between
the 2?DNS round robin or use a hardware load balancer?
When should a 2nd database server should be added? Are there any other
optimizations we could be benefiting from?
If anyone has any input, it would be much appreciated.
Travis
When importing a WikiDump through mantainance/importDump.php, is there
a good reason for -not- refreshing links on a page by page basis but
running mantainance/refreshLinks.php once everything is done?
If there isn't, how do I get the id of the page/revision the import
process has just created so that I can pass it to
fixLinksFromArticle($id)? Or should I modify include/SpecialImport.php
so that it fixes the links right away?
Ciao!
Manu
--
Emanuele D'Arrigo
vfx free electron
On Sat, 2007-08-11 at 08:37 -0400, Anthony wrote:
> Then running daily a simple recursive download which checks timestamps
> to avoid downloading the same file over will be nearly as bandwidth
> efficient as rsync - and probably much more CPU efficient, as turning
> on indexing isn't going to "quickly overload the backend".
I don't think that's the case. rsync doesn't have to hit every single
file on both ends when it compares the MD4 checksums of the remote end.
If you rely on various thousands of http clients to implement the header
checking correctly, they WILL hit every single local and remote file at
least once, twice if they need to fetch it (HEAD then GET).
You can limit how much cpu/bandwidth/etc. rsync/rsyncd takes up on the
server side, and throttle connections back if you're worried about
overloading it. You can also prohibit using the -z option on the server
side, so clients don't abuse the CPU to compress data which is already
compressed.
But now we're back to this point again... why not just use zsync and get
the benefits of both worlds?
Unless your suggestion was to open up indexing for a -known- set of IPs,
and not to the world-at-large..
--
David A. Desrosiers
desrod(a)gnu-designs.com
setuid(a)gmail.com
http://projects.plkr.org/
Skype...: 860-967-3820
Dear Friends in spirit,
www.homeowiki.de - Wikisite for homeopathic physicians
Runs since about 3 years without problem, up till yesterday version 1.5.7
During last weeks (dont know exactly because Holiday) the UserLogin-Page came up empty - as my default Localuser.php demands Logon to even read pages the wiki was basically shut down.
http://www.homeopathy.at/wiki/index.php?title=Spezial:Anmelden&returnto=Hau…
I upgraded today to version 1.10.1 - no change
I installed a new version 1.10.1 with a total new Database http://www.homeopathy.at/wiki2
And get the same problem http://www.homeopathy.at/wiki2/index.php?title=Spezial:Anmelden&returnto=Sp… - an empty login - Page, the remaining Pages seem to work.
So I have to assume, that my provider has changed whatever - but the remainder of the wiki seems to work, just not the Login-page
$wgShowExceptionDetails = true does not deliver any additional information
Any Idea ??
Tx a lot !!!
(As I am using now a new setup Localsettings.php the wiki is unlocked and readable to anyone but unsafe)
liebe Grüsse
Heli Retzek
=======================
Dr.med. Helmut B Retzek
Arzt für Allgemeinmedizin - klassische Homöopathie
Oberbleichfleck 2 - A-4840 Vöcklabruck
07672-23700 (priv -11, fax -12)
www.homeopathy.at
I want to find out the length of a bunch of articles. I have
earlier done this for the Swedish Wikipedia by importing the
page.sql dump into a local MySQL instance, which works just fine.
But now that I try it for the English Wikipedia, the database
import (of 10 million rows, averaging 94 bytes) appears to take
somewhere between 24 and 48 hours (with keys disabled, I'm
importing some 4500 rows per minute). This seems a bit
unnecessary for just finding out the length of some 1000 articles.
Especially if I want to do it again when the next dump becomes
available. Is there some API on the toolserver, that I can use
instead? Or should I consider retrieving the action=raw from the
live server and just count the bytes? Where do I start?
I could even write a Perl script that parses the insert statements
in page.sql and extracts the information I need, all in one pass.
But this is not really why a MySQL dump is created.
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
On 10/08/07, bugzilla-daemon(a)mail.wikimedia.org
<bugzilla-daemon(a)mail.wikimedia.org> wrote:
> Rob Church <robchur(a)gmail.com> has used the 'sudo' feature to access
> Bugzilla using your account.
>
> Rob Church <robchur(a)gmail.com> provided the following reason for doing this:
> Tweaking preferences for mailing list, etc.
Was checking email preferences with a view to suppressing mailings
when the CC field was changed.
Looking in the wrong place. :P
Rob Church
On 10/08/07, erik(a)svn.wikimedia.org <erik(a)svn.wikimedia.org> wrote:
> Revision: 24718
> Author: erik
> Date: 2007-08-10 09:07:16 +0000 (Fri, 10 Aug 2007)
>
> Log Message:
> -----------
> add a few relevant planet links, remove GNOME, Apache, Sun links,
Why remove them? Aren't GNOME and Apache part of that free software
thing? Don't Sun provide us with toolserver hardware? Or do these
organisations spread the wrong kind of propaganda?
Rob Church
My wiki (which is just a bunch of help pages authored by me that are
for advanced users of algebra.com), was protected from spammers by
only allowing registered users to make edits (and not allowing to
register).
It worked great.
Then I upgraded it and forgot to make same permissions changes. A year
later I looked at the site and it was totally infested by spammers.
I cleaned up quite a few pages, but there are some obscure ones that I
may have missed.
Is there some SQL command or some such to find the spammed pages?
i
Hello
The MathTran site http://www.mathtran.org now has a MathTran enabled
wiki http://www.mathtran.org/wiki (powered by MediaWiki).
It provides, I think, improved display and editing of mathematics
formulas. I'd be delighted if you visited it, and let me know what you
thought. (The editing works better with Firefox. If any IE Javascript
expert would like to help me, I'd be grateful).
Best regards
Jonathan