tl;dr. Great question! It's gonna be complicated and/or impossible with the data we currently collect.


I'm looking for a method to determine the parameters of the distribution of page views per visit. I would also love to know the distribution for the length of time between visits. Does anyone know of any studies already done on this topic? Google is not my friend today -- I haven't yet found anything.

So, I can tell you my understanding of how Google Analytics does this, which might be helpful for rigging something together in the future, but I'll tell you up front that I'm pretty sure we don't presently have the data to answer your questions.

For views per visit and time between visits, you need a few pieces of machinery in place.

First, you need to tag users uniquely. In practice, this means setting a cookie. 

Second, you need to tag sessions. This can be done in post-processing, or at the time of the request by setting a mtime (last-touch-time) cookie. Essentially a session is a group of views bounded by some length of inactivity. So let's say our bound is 15m. If I see hits from a single user at noon, 12:10p, 12:11, 12:20, 12:40, 12:50, and 1:01p, I have two sessions (12p - 12:20p, and 12:40p - 1:01p). The hit at 12:20p doesn't have another touch until 12:40p, which is outside our touch bound.

This is enough to do views per visit and time between visits. Time on page would require an additional heartbeat beacon from the page post-load to confirm the user isn't idle, and the page isn't in the background. (If you watch the web console on a page with GA enabled, you'll see these go out periodically.)
 
Our problem is that we don't have ID tokens, meaning we can't do anything involving uniques. We can approximate them using a hash of UA, IP, and some fiddling, but ultimately we sample 1:1000 on *views*, so the proportion of views that belong to uniques is ultimately unknown, and I'd consider estimates to be highly unreliable.

If the data doesn't exist, the best I think I can have is the average number of page views per visit. I have a problem though: the comScore numbers available at http://reportcard.wmflabs.org/ is broken out by region; not by site. Using this data I'll only be able to get the average for all our properties worldwide -- which is a little bit rough. Does anyone have access to the raw data? If so -- does it tell us the number of uniques per site, or is it really only by region?

Several things about comScore data. First, it's always in aggregate. This means that even if we had per-site numbers, you'd only get averages (as you note). Second, we don't have have most of the breakouts. (I'll stop by your desk and we can explore what's there, but sadly getting access is a laborious process, otherwise I'd happily just give you creds.) Third, I don't think they actually offer much in the way of visits. But we can look.
 
Does anyone have any better ideas?

No. Not good ideas, anyway. If you have unsampled raw logs for the subset of views you're interested in, a heuristic for uniques is totally reasonable. We only have sampled data (afaik), so that route is closed.


-- Context --
I'm trying to model some fundraising data to solve the optimal banner distribution problem (effectively what's the best way to show people banners) . Our data on the 'number of banner impressions till donation' indicates that people are far more likely to donate on the first banner impression. However, this decays over time. My hypothesis is that it's there is no difference between showing a user only one banner per visit over multiple visits and showing multiple banners 100% of the time.

If this hypothesis is true; it will lead into fundraising developing a banner display function that will solve the following problem statement: "show P percent of all unique visitors, under time T, N banners with M banners displayed per session".
 
--
David Schoonover
dsc@wikimedia.org