Tony Sidaway wrote:
Brion Vibber said:
Brion Vibber wrote:
Update logs are still replaying, but we're up to 42 minutes prior to the crash on one machine and still going. I don't expect problems.
With two servers fully recovered we've got the wikis up for read-write access; editing is open. Total time from crash to restoring edit service was about 24 hours, 10 minutes. Sigh.
Some special pages (including contribs and watchlist) are off for the moment to reduce server load until we have more machines up. Some things remain a little wonky.
Interesting discussion on Slashdot about the relative recoverability of Postgresql. If we stay with open source DBMS, perhaps at least some of the database servers should be running alternative software.
At this moment the current colo is a single point of failure. The French squids will not work if the database is not available. If you want a 100% uptime, you need the complete stack of software elsewhere to allow for a fail over. This is something we do not do yet. When we have full redundancy, we will still have problems. We will have different problems.
So maybe PostgreSQL is better at recovery. It is not the whole solution, it would at best solve one problem. It would create others.
Thanks, GerardM