Today, for those of us paying attention from the outside, the downtime
became impossible to monitor. One problem is that the TTL on the DNS
server and maintenance records themselves is set much too short!
Maybe you never thought you'd be inaccessible for an hour?
Any chance master records could be moved to an anycast DNS provider?
The SOA is pretty reasonable:
wikimedia.org. 86400 IN SOA
ns0.wikimedia.org.
hostmaster.wikimedia.org. 2011052410 43200 7200 1209600 600
But the actual records are all 1 hour, so they disappeared during
the downtime. And ganglia, while it looks OK:
ns0.wikimedia.org. 3600 IN A 208.80.152.130
ns1.wikimedia.org. 3600 IN A 208.80.152.142
ns2.wikimedia.org. 3600 IN A 91.198.174.4
secure.wikimedia.org. 3600 IN A 208.80.152.134
ganglia.wikimedia.org. 3600 IN CNAME
spence.wikimedia.org.
spence.wikimedia.org. 3600 IN A 208.80.152.161
Had completely timed out in both my local cache and the Google servers
while I was looking at it, and wasn't able to contact a NS anywhere to
refresh. Here's the last time I saw it:
ganglia.wikimedia.org. 473 IN CNAME
spence.wikimedia.org.
spence.wikimedia.org. 473 IN A 208.80.152.161
;; Query time: 2 msec
;; SERVER: 10.0.1.1#53(10.0.1.1)
;; WHEN: Tue May 24 10:06:13 2011
ganglia.wikimedia.org. 287 IN CNAME
spence.wikimedia.org.
spence.wikimedia.org. 287 IN A 208.80.152.161
;; Query time: 68 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Tue May 24 10:09:19 2011