Hi!
Well, we only noticed what was up due to this email! Take a look at https://phabricator.wikimedia.org/T119915
Yes, we need to look into it. The problem is that the service has two failure modes:
1. Completely dead, rejecting all queries. This would be caught by icinga and alerted.
2. Crawling slow, but still partially alive, just performing very very badly. For this one, we do not have adequate alert system. This failure mode is rare, but we've seen it to happen, both due to somebody sending a torrent of heavy queries and some bug scenarios. Icinga does not catch that because it only checks very basic queries and those are still under timeout.