I suppose you could get more granular data by conducting an opt-in study of some kind, and you would need to be careful that users who haven't opted in are not accidentally included or indirectly have their privacy affected. I agree that collection at intervals shorter than an hour is going to raise a lot of privacy considerations for users who have not opted in.


On Thu, Sep 18, 2014 at 12:03 PM, Benj. Mako Hill <mako@atdot.cc> wrote:
<quote who="Valerio Schiavoni" date="Wed, Sep 17, 2014 at 04:14:04PM +0200">
> Unfortunately, no. Those logs only provide page counts but without the
> associated timestamps ("when" those pages have been accessed). If such logs
> exist, they would perfectly do..

The pagecount data /has/ timing data but they are "binned" by the

I don't think more comprehensive data (all pages, all languages,
nearly all viewers) over a long period of time exists anywhere and I
don't think any similarly comprehensive data exists before 2007 at

You might find more granular data for short periods of time (like the
WikiBench data or maybe stuff that's been collected more recently by
WMF but isn't published) or much more detailed data from longer
periods of time for a subset of users on a particular network (perhaps
like the Indiana data, or toolbar data like the Yahoo data that some
WP researchers have used).

I would /love/ to hear that I am wrong about this and that there's
some wonderful, granual, broad, long-term dataset of pageviews I just
don't know about it. :)


Benjamin Mako Hill

Creativity can be a social contribution, but only in so far
as society is free to use the results. --GNU Manifesto

Wiki-research-l mailing list