Is there any *public* list of which exceptions/errors they are.
Well, the Ganglia graphs distinguish different types of errors (out-of-memory fatals, time limit fatals, miscellaneous fatals, exceptions, catchable fatals, and query errors). At present there is nothing that is more granular than that, private or public. The error log we consult is an undifferentiated stream of text.
However, it is an area of our code that could easily welcome contributions from the community. Hashar enabled error logging for the beta cluster, so labs is now a viable development environment for a generic error-processing solution.
Relevant code exists in two locations:
https://git.wikimedia.org/blob/operations%2Fpuppet.git/9792c164d10f9f9f20922... (this is the script that is emitting stats to Ganglia)
and
https://git.wikimedia.org/tree/mediawiki%2Ftools%2Ffluoride.git (set of regexps to parse the data even further; not currently used anywhere.)
I've been working on this in my spare time, but I'd be happy to provide mentorship, code review & deployment from interested contributors. If someone competent (a category which explicitly includes you, Brian!) wants to take over and "own" this problem, that's cool with me too.
There's a lot we could do in this area. It should be possible to probabilistically trace an error to the commit(s) that introduced it.