Re: [Analytics] [Pageviews] [Technical] Simplifying the available static dumps of pageview data

24 Dec 2015

Apologies!  I realized it was Christmas Eve but I by no means meant to rush
this conversation.  Take as long as you like to answer to the thread and
enjoy your holidays everyone :)  I'll poke the thread again after the New
Year.  Happy Holidays!

On Thu, Dec 24, 2015 at 9:21 AM, Erik Zachte &lt;ezachte(a)wikimedia.org&gt; wrote:

...
  Dan, thanks for raising the issue (a bit less for
raising it on X-mas eve
 ;-) (just kidding, mostly)

 Frankly I don't see much use for the earlier releases at all. The newest
 version had been kept very much downward compatible, migration of clients
 should be a no-brainer (mostly switching download url). Upgrading those
 same clients to also use the new additional counts is bit more work as the
 coding scheme is tedious (as a result of that downward compatability). But
 that upgrading could be done later.

 I propose to deprecate both earlier sets, and set an end date for updating
 those, e.g. July 1, and publish that widely, and offer support with
 migration. If people feel otherwise please chime in. Keeping the existing
 files is another matter, we should do so of course.

 About my aggregation datasets, it's just that: an aggregation of hourly
 files into daily and monthly aggregates, with extreme compression while
 retaining hourly precision, and adjusting for missing data (by
 extrapolation). These files are ideal for batch processes and lean
 downloads, and archiving for the longer haul.

 Reworking the datasets, in whatever way, with categories as part of the
 scheme sounds like a major overhaul, not like cleaning up old stuff.
 Exciting, but best to be done under a separate flag.

 Cheers,

 Erik

 *From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On
 Behalf Of *Maurice Vergeer
 *Sent:* Thursday, December 24, 2015 15:12
 *To:* A mailing list for the Analytics Team at WMF and everybody who has
 an interest in Wikipedia and analytics.
 *Subject:* Re: [Analytics] [Pageviews] [Technical] Simplifying the
 available static dumps of pageview data

 Dear all,

 As I just mentioned to Dan in a private email conversation, keeping
 datasets even with imperfect measurements is important. Particularly for
 longitudinal analysis.

 Also, from what I understand - me being a newby here - is that the data
 are stored in separate files. Dan suggested reordering the page into
 categories. Maybe, another option is to create more extensive datasets with
 more different measurements in a single datafile. On the other hand, the
 files would become even bigger in size. Not an issue for mee, but for users
 in the field accesibility (dowlnload bandwidth) could become an issue.

 my two cents

 Maurice

 On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk &lt;alex.druk(a)gmail.com&gt; wrote:

 Nothing against this approach!

 On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu &lt;dandreescu(a)wikimedia.org&gt;
 wrote:

 On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk &lt;alex.druk(a)gmail.com&gt; wrote:

 Hi Dan,

 Happy holidays!

 Good idea to combine these datasets! However we have one more dataset by
 Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/

 And that's an important one!  But I was thinking we could re-organize the
 page into categories.  Erik's dataset could go into a "processed data"
 category or something like that.  The three I wanted to talk about on this
 thread are just the raw data.

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 --

 Thank you.

 Alex Druk
 alex.druk(a)gmail.com
 (775) 237-8550 Google voice

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 --

 ________________________________________________
 Maurice Vergeer
 To contact me, see http://mauricevergeer.nl/node/5
 To see my publications, see http://mauricevergeer.nl/node/1
 ________________________________________________

 _______________________________________________
 Analytics mailing list
 Analytics(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] [Pageviews] [Technical] Simplifying the available static dumps of pageview data