We're happy to announce a few improvements to Analytics data releases on
dumps.wikimedia.org:
* We are releasing a new dataset, an estimate of Unique Devices accessing
our projects [1]
* We are officially making available a better Pageviews dataset [2]
* We are deprecating two older pageview statistics datasets
* We moved Analytics data from /other to /analytics [3]
Details follow:
*Unique Devices:* Since 2009, the Wikimedia Foundation used comScore to
report data about unique web visitors. In January 2016, however, we
decided to stop reporting comScore numbers [4] because of certain
limitations in the methodology, these limitations translated into
misreported mobile usage. We are now ready to replace comscore numbers with
the Unique Devices Dataset [5][1]. While unique devices does not equal
unique visitors, it is a good proxy for that metric, meaning that a major
increase in the number of unique devices is likely to come from an increase
in distinct users. We understand that counting uniques raises fairly big
privacy concerns and we use a very private conscious way to count unique
devices, it does not include any cookie by which your browser history can
be tracked [6].
We invite you to explore this new dataset and hope it’s helpful for the
Wikimedia community in better understanding our projects. This data can
help measurethe reach of wikimedia projects on the web.
*Pageviews:* This [2] is the best quality data available for counting the
number of pageviews our projects receive at the article and project level.
We've upgraded from pagecounts-raw to pagecounts-all-sites, and now to
pageviews, in order to filter out more spider traffic and measure something
closer to what we think is a real user viewing content. A short history
might be useful:
* pagecounts-raw: was maintained by Domas Mituzas originally and taken
over by the analytics team. It was and still is the most used dataset,
though it has some majore problems. It does not count access to the mobile
site, it does not filter out spider or bot traffic, and it suffers from
unknown loss due to logging infrastructure limitations.
* pagecounts-all-sites: uses the same pageview definition as
pagecounts-raw, and so also does not filter out spider or bot traffic. But
it does include access to mobile and zero sites, and is built on a more
reliable logging infrastructure.
* pagecounts-ez: is derived from the best data available at the time.
So until December 2015, it was based on pagecounts-raw and
pagecounts-all-sites, and now it's based on pageviews. This dataset is
great because it compresses very large files without losing any
information, still providing hourly page and project level statistics.
So the new dataset, pageviews, is what's behind our pageview API and is now
available in static files for bulk download back to May 2015. But the
multiple ways to download pageview data is confusing for consumers, so
we're keeping only pageviews and pagecounts-ez and deprecating the other
two. If you'd like to read more about the current pageview definition,
details are on the research page [7].
*Deprecating:* We are deprecating the pagecounts-raw and
pagecounts-all-sites datasets in May 2016 (discussion here:
https://phabricator.wikimedia.org/T130656 ). This data suffers from many
artifacts, lack of mobile data, and/or infrastructure problems, and so is
not comparable to the new way we track pageviews. It will remain here
because we have historical data that may be useful, but it will not be
maintained or updated beyond May 2016.
*Clean-up:* Analytics data on dumps was crammed into /other with unrelated
datasets. We made a new page to receive current and future datasets [3]
and linked to it from /other and /. Please let us know if anything there
looks confusing or opaque and I'll be happy to clarify.
[1]
http://dumps.wikimedia.org/other/unique_devices
[2]
http://dumps.wikimedia.org/other/pageviews
[3]
http://dumps.wikimedia.org/analytics/
[4]
https://meta.wikimedia.org/wiki/ComScore/Announcement
[5]
https://meta.wikimedia.org/wiki/Research:Unique_Devices
[6]
https://meta.wikimedia.org/wiki/Research:Unique_Devices#How_do_we_count_uni…
[7]
https://meta.wikimedia.org/wiki/Research:Page_view