or maybe dumps.wikimedia.org/traffic?

 

I hope someday we will (again) have edit stats similar to the views stats we now have (geo breakdown etc).

 

Erik

 

From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron Halfaker
Sent: Tuesday, February 16, 2016 18:11
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.
Subject: Re: [Analytics] [Pageviews] [Technical] Simplifying the available static dumps of pageview data

 

dumps.wikimedia.org/analytics

 

Does "analytics" mean anything in this context?  Why not aim for something like dumps.wikimedia.org/views?  

-Aaron

 

On Thu, Feb 11, 2016 at 9:39 AM, Oliver Keyes <okeyes@wikimedia.org> wrote:

It's also the International Day of Women and Girls in Science!

Sounds like a good summary.


On 11 February 2016 at 07:31, Dan Andreescu <dandreescu@wikimedia.org> wrote:
> I almost revived this thread on Mardi Gras, but I didn't want to be known as
> The Holiday Crusher so I waited.  Today is relatively safe [1] :)
>
> Ok, there are three main points being made:
>
> 1. deprecating the old datasets
> 2. liberating ourselves from the old format
> 3. reorganizing the dumps page
>
> My thoughts on each:
>
> 1. I agree with Dario and Erik's points.  Let's keep the old files around,
> but stop generating new files in May 2016.  To explain this, we'll make a
> new section called "Deprecated" and put links to the pagecounts-* datasets
> there.
>
> 2. I wasn't expecting to talk about format, but it makes sense because, for
> example, Erik's dataset is just a pivoted format.  So, we could have a
> section for the Pageview datasets, with links for each format we already
> have: Domasz archive format, Erik Z compressed format.  We could then add a
> new format that's easier to understand and could even include some of the
> data we expose via the pageview API.  But from an organizational point of
> view, treating "format" as a separate concept from "dataset" will be an
> improvement.
>
> 3. I think it's time we had our own page instead of just being under
> dumps.wikimedia.org/other.  Let's have dumps.wikimedia.org/analytics and
> link to it from both the main dumps page and /other.  The separation will
> make it easier to reference other places we have data static file dumps,
> like datasets.wikimedia.org.  And it'll also make it easier to add links and
> references to how this work is being done and where people can interact with
> us or help us.
>
>
> I hope I captured what everyone was saying.  If there aren't any objections,
> I'll send a list of next steps needed to accomplish this, and get to work :)
>
>
>
> [1] Today is Be Electrific Day, Get Out Your Guitar Day, Grandmother
> Achievement Day, National Don't Cry Over Spilled Milk Day, National
> Inventors' Day, National Make a Friend Day, National Peppermint Patty Day,
> National Shut-in Visitation Day, Pro Sports Wives Day, Promise Day,
> Satisfied Staying Single Day, White Shirt Day
>
>
> On Wed, Jan 6, 2016 at 7:13 PM, Dario Taraborelli
> <dtaraborelli@wikimedia.org> wrote:
>>
>> Erik's proposal sounds very reasonable.
>>
>> There might be some confusion about what we mean by "keeping the old
>> datasets for longitudinal analysis". No one is planning to remove the old
>> static dumps, just stop generating them/maintaining them going forward.
>>
>> I also want to echo Nuria regarding the human cost of maintaining multiple
>> definitions. I just finished preparing a response to a reporter who was
>> asking about project-level mobile PV data and I was not immediately able to
>> answer if a specific data source I wanted to cite was using the old or new
>> definition (until I talked to Dan and we looked up together a gerrit patch).
>>
>> How do people feel about turning off the generation of old dumps by May
>> 2016, i.e. one year after having the two series of data available in
>> parallel?
>>
>>
>>
>> On Wed, Jan 6, 2016 at 10:17 AM, Nuria Ruiz <nuria@wikimedia.org> wrote:
>>>
>>> >As I just mentioned to Dan in a private email conversation, keeping
>>> > datasets even with imperfect measurements is important. Particularly for
>>> > longitudinal analysis.
>>> Have in mind that maintaining these old dumps is not "free", it causes a
>>> lot of confusion and maintenance costs to have several pageview definitions
>>> around. We get a lot of questions about spiky-ness of old definition and we
>>> need to maintain software that generates the old files thus, we think is
>>> reasonable to ask our users to transition to the new definition and
>>> eventually (in a period of months) turn off the old dumps.
>>>
>>> On Thu, Dec 24, 2015 at 6:12 AM, Maurice Vergeer <m.vergeer@maw.ru.nl>
>>> wrote:
>>>>
>>>> Dear all,
>>>>
>>>> As I just mentioned to Dan in a private email conversation, keeping
>>>> datasets even with imperfect measurements is important. Particularly for
>>>> longitudinal analysis.
>>>>
>>>> Also, from what I understand - me being a newby here - is that the data
>>>> are stored in separate files. Dan suggested reordering the page into
>>>> categories. Maybe, another option is to create more extensive datasets with
>>>> more different measurements in a single datafile. On the other hand, the
>>>> files would become even bigger in size. Not an issue for mee, but for users
>>>> in the field accesibility (dowlnload bandwidth) could become an issue.
>>>>
>>>> my two cents
>>>> Maurice
>>>>
>>>>
>>>> On Thu, Dec 24, 2015 at 2:58 PM, Alex Druk <alex.druk@gmail.com> wrote:
>>>>>
>>>>> Nothing against this approach!
>>>>>
>>>>> On Thu, Dec 24, 2015 at 2:55 PM, Dan Andreescu
>>>>> <dandreescu@wikimedia.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 24, 2015 at 8:48 AM, Alex Druk <alex.druk@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Dan,
>>>>>>> Happy holidays!
>>>>>>> Good idea to combine these datasets! However we have one more dataset
>>>>>>> by Erik Zachte : http://dumps.wikimedia.org/other/pagecounts-ez/
>>>>>>
>>>>>>
>>>>>> And that's an important one!  But I was thinking we could re-organize
>>>>>> the page into categories.  Erik's dataset could go into a "processed data"
>>>>>> category or something like that.  The three I wanted to talk about on this
>>>>>> thread are just the raw data.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Analytics mailing list
>>>>>> Analytics@lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thank you.
>>>>>
>>>>> Alex Druk
>>>>> alex.druk@gmail.com
>>>>> (775) 237-8550 Google voice
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ________________________________________________
>>>> Maurice Vergeer
>>>> To contact me, see http://mauricevergeer.nl/node/5
>>>> To see my publications, see http://mauricevergeer.nl/node/1
>>>> ________________________________________________
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>>
>>
>> --
>>
>>
>> Dario Taraborelli  Head of Research, Wikimedia Foundation
>> wikimediafoundation.orgnitens.org • @readermeter
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--

Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics