Happy Holidays indeed, everyone!

Let's celebrate an eventful year with lots of progress on the Analytics front. But also open issues waiting to be addressed asap in the next year.

My personal priority is to get the geographical reports back up running, now that Dan implemented a new geo data feed using hive data, earlier this month. Thanks again, Dan!

 

From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Dan Andreescu
Sent: Thursday, December 24, 2015 15:25
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Subject: Re: [Analytics] [Pageviews] [Technical] Simplifying the available static dumps of pageview data

 

Apologies!  I realized it was Christmas Eve but I by no means meant to rush this conversation.  Take as long as you like to answer to the thread and enjoy your holidays everyone :)  I'll poke the thread again after the New Year.  Happy Holidays!

 

On Thu, Dec 24, 2015 at 9:21 AM, Erik Zachte <ezachte@wikimedia.org> wrote:

Dan, thanks for raising the issue (a bit less for raising it on X-mas eve ;-) (just kidding, mostly)

 

Frankly I don't see much use for the earlier releases at all. The newest version had been kept very much downward compatible, migration of clients should be a no-brainer (mostly switching download url). Upgrading those same clients to also use the new additional counts is bit more work as the coding scheme is tedious (as a result of that downward compatability). But that upgrading could be done later.

 

I propose to deprecate both earlier sets, and set an end date for updating those, e.g. July 1, and publish that widely, and offer support with migration. If people feel otherwise please chime in. Keeping the existing files is another matter, we should do so of course.

 

About my aggregation datasets, it's just that: an aggregation of hourly files into daily and monthly aggregates, with extreme compression while retaining hourly precision, and adjusting for missing data (by extrapolation). These files are ideal for batch processes and lean downloads, and archiving for the longer haul.

 

Reworking the datasets, in whatever way, with categories as part of the scheme sounds like a major overhaul, not like cleaning up old stuff. Exciting, but best to be done under a separate flag.

 

Cheers,

Erik  

 

 

 

From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Maurice Vergeer
Sent: Thursday, December 24, 2015 15:12
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Subject: Re: [Analytics] [Pageviews] [Technical] Simplifying the available static dumps of pageview data

 

Dear all,

As I just mentioned to Dan in a private email conversation, keeping datasets even with imperfect measurements is important. Particularly for longitudinal analysis.

Also, from what I understand - me being a newby here - is that the data are stored in separate files. Dan suggested reordering the page into categories. Maybe, another option is to create more extensive datasets with more different measurements in a single datafile. On the other hand, the files would become even bigger in size. Not an issue for mee, but for users in the field accesibility (dowlnload bandwidth) could become an issue.

my two cents

Maurice

 

 

On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk <alex.druk@gmail.com> wrote:

Nothing against this approach!

 

On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu <dandreescu@wikimedia.org> wrote:

 

 

On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk <alex.druk@gmail.com> wrote:

Hi Dan,

Happy holidays!

Good idea to combine these datasets! However we have one more dataset by Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/

 

And that's an important one!  But I was thinking we could re-organize the page into categories.  Erik's dataset could go into a "processed data" category or something like that.  The three I wanted to talk about on this thread are just the raw data. 

 

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



 

--

Thank you.

Alex Druk
alex.druk@gmail.com
(775) 237-8550 Google voice


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--

________________________________________________
Maurice Vergeer
To contact me, see http://mauricevergeer.nl/node/5
To see my publications, see http://mauricevergeer.nl/node/1
________________________________________________


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics