Today, for those of us paying attention from the outside, the downtime became impossible to monitor. One problem is that the TTL on the DNS server and maintenance records themselves is set much too short!
Maybe you never thought you'd be inaccessible for an hour?
Any chance master records could be moved to an anycast DNS provider?
The SOA is pretty reasonable:
wikimedia.org. 86400 IN SOA ns0.wikimedia.org. hostmaster.wikimedia.org. 2011052410 43200 7200 1209600 600
But the actual records are all 1 hour, so they disappeared during the downtime. And ganglia, while it looks OK:
ns0.wikimedia.org. 3600 IN A 208.80.152.130 ns1.wikimedia.org. 3600 IN A 208.80.152.142 ns2.wikimedia.org. 3600 IN A 91.198.174.4
secure.wikimedia.org. 3600 IN A 208.80.152.134
ganglia.wikimedia.org. 3600 IN CNAME spence.wikimedia.org. spence.wikimedia.org. 3600 IN A 208.80.152.161
Had completely timed out in both my local cache and the Google servers while I was looking at it, and wasn't able to contact a NS anywhere to refresh. Here's the last time I saw it:
ganglia.wikimedia.org. 473 IN CNAME spence.wikimedia.org. spence.wikimedia.org. 473 IN A 208.80.152.161
;; Query time: 2 msec ;; SERVER: 10.0.1.1#53(10.0.1.1) ;; WHEN: Tue May 24 10:06:13 2011
ganglia.wikimedia.org. 287 IN CNAME spence.wikimedia.org. spence.wikimedia.org. 287 IN A 208.80.152.161
;; Query time: 68 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Tue May 24 10:09:19 2011
frankly, european dns server had problems we didn't really notice - until the actual maintenance. DNS should've been fully functional as it is in multiple datacenters.
Domas
On 25/05/11 00:31, Domas Mituzas wrote:
frankly, european dns server had problems we didn't really notice - until the actual maintenance. DNS should've been fully functional as it is in multiple datacenters.
What was the problem exactly? I don't see anything about it in the server admin log.
-- Tim Starling
Turning down expiration times for a maintenance isn't unusual. It allows sites to redirect everyone to an alternate location for a very short duration.
As for an anycast, I doubt that would be cost effective. After all the Foundation gets upwards of 400mil unique visitors a month and (I've heard) roughly 12 billion hits. DnsMadeEasy is one of the bigger any cast sites and their biggest listed plan [1] is 50mil queries a month for $125 + as low as 1.60 for each million after that. So that would be something like $20,000 a month (give or take, not every hit would require a new query due to caching, but that's worse case). That isn't a huge number, but the Foundation will soon have 3 separate data centers to answer DNS from, and it only costs a few servers every few years.
-Jon
On Tue, May 24, 2011 at 07:27, William Allen Simpson < william.allen.simpson@gmail.com> wrote:
Today, for those of us paying attention from the outside, the downtime became impossible to monitor. One problem is that the TTL on the DNS server and maintenance records themselves is set much too short!
Maybe you never thought you'd be inaccessible for an hour?
Any chance master records could be moved to an anycast DNS provider?
The SOA is pretty reasonable:
wikimedia.org. 86400 IN SOA ns0.wikimedia.org. hostmaster.wikimedia.org. 2011052410 43200 7200 1209600 600
But the actual records are all 1 hour, so they disappeared during the downtime. And ganglia, while it looks OK:
ns0.wikimedia.org. 3600 IN A 208.80.152.130 ns1.wikimedia.org. 3600 IN A 208.80.152.142 ns2.wikimedia.org. 3600 IN A 91.198.174.4
secure.wikimedia.org. 3600 IN A 208.80.152.134
ganglia.wikimedia.org. 3600 IN CNAME spence.wikimedia.org. spence.wikimedia.org. 3600 IN A 208.80.152.161
Had completely timed out in both my local cache and the Google servers while I was looking at it, and wasn't able to contact a NS anywhere to refresh. Here's the last time I saw it:
ganglia.wikimedia.org. 473 IN CNAME spence.wikimedia.org. spence.wikimedia.org. 473 IN A 208.80.152.161
;; Query time: 2 msec ;; SERVER: 10.0.1.1#53(10.0.1.1) ;; WHEN: Tue May 24 10:06:13 2011
ganglia.wikimedia.org. 287 IN CNAME spence.wikimedia.org. spence.wikimedia.org. 287 IN A 208.80.152.161
;; Query time: 68 msec ;; SERVER: 8.8.8.8#53(8.8.8.8) ;; WHEN: Tue May 24 10:09:19 2011
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org