Is there a catalog of all data that could possibly be available (for
instance, the mw.session cookie), along with where it is logged, for
how long, and where in various toolchains it gets stripped out?

Not to my knowledge. So, in terms of /readers/, we deliberately have very little. Possible vectors I'm aware of:

-the mw.session cookie. This is stripped out of varnishlog before it even gets to the analytics machines, so presumably doesn't make it past udp2log.
-EventLogging data. For example, data to test how our caching or module storage is working. We've got some of this for the time period I analysed, and I'm planning on using the module storage data to test the algorithm, since it contains a unique identifier independent of IP/UA. This sort of information is gathered for specific tasks, though, rather than by default, which I'm kind of happy with: if the existing algorithm is valid I don't really want to see more PII in our logs. If not, eh, we'll assess how important session data is outside of academia.
-the UA/IP/lang data
-...that's it.

Obviously these are "vectors I'm aware of" - I am fully open to being corrected by someone more informed than myself.
 
Related lists could be useful for planning:
* Limitations our privacy policies place on data gathering (handy when
reviewing those policies)
Indeed; the analytics team is working out how we address data retention as we speak.
* Studies that are easy and hard given the types of data we gather
* Wishlists (from external researchers, and from internal staff) of
data-sets that would be useful but aren't currently available.  Along
with a sense of priority, complexity, cost.
Yep, these thought experiments are being factored into our data retention discussion.