Tony Sidaway wrote:
Brion Vibber said:
Brion Vibber wrote:
Update logs are still replaying, but we're up
to 42 minutes prior to
the crash on one machine and still going. I don't expect problems.
With two servers fully recovered we've got the wikis up for read-write
access; editing is open. Total time from crash to restoring edit
service was about 24 hours, 10 minutes. Sigh.
Some special pages (including contribs and watchlist) are off for the
moment to reduce server load until we have more machines up. Some
things remain a little wonky.
Interesting discussion on Slashdot about the relative recoverability of
Postgresql. If we stay with open source DBMS, perhaps at least some of
the database servers should be running alternative software.
At this moment the current colo is a single point of failure. The French
squids will not work if the database is not available. If you want a
100% uptime, you need the complete stack of software elsewhere to allow
for a fail over. This is something we do not do yet. When we have full
redundancy, we will still have problems. We will have different problems.
So maybe PostgreSQL is better at recovery. It is not the whole solution,
it would at best solve one problem. It would create others.
Thanks,
GerardM