Here's a task that captures some of the things to consider for server side enrichment of X-Analytics (in this case it would be the Mobile Content Service doing the work, I think).

https://phabricator.wikimedia.org/T92875

Here are the quarterly goals. The thought to reflect counting in a more efficient way kind of entered a little later in the quarter, sorry about that (and thanks for helping us figure out short and mid-term approach).

https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q1_Goals#Reading

-Adam

On Wed, Aug 19, 2015 at 9:27 AM, Andrew Otto <aotto@wikimedia.org> wrote:
Ya, we can probably tweak pageview definition to use page_id / page_title if they exist, and only use the rest of the logic if they don’t.


> On Aug 19, 2015, at 12:24, Oliver Keyes <okeyes@wikimedia.org> wrote:
>
> It'll need to be, some requests don't know pageID in advance, which I
> think was the reason Apps initially didn't implement this.
>
> On 19 August 2015 at 12:19, Andrew Otto <aotto@wikimedia.org> wrote:
>> If your app/site/etc. is creating a request that it wants to count as a
>> pageview, add an X-Analytics header with pageview_id=<page_id> or
>> pageview_title=<page_title>
>>
>>
>> page_id is the current key, so let’s keep that.  page_title would be good to
>> have too.  Let’s make it an and/or.
>>
>>
>> On Aug 19, 2015, at 12:17, Bernd Sitzmann <bernd@wikimedia.org> wrote:
>>
>>> If your app/site/etc. is creating a request that it wants to count as a
>>> pageview, add an X-Analytics header with pageview_id=<page_id> or
>>> pageview_title=<page_title>
>>
>>
>> Ideally the page id would be the way to go. From a client's perspective I
>> prefer the page title since clients don't always know the page id ahead of
>> time. (We could put that header into the second request of loading the page
>> but I cannot guarantee that we we will always have a second request in the
>> future.)
>>
>> --Cheers,
>> Bernd
>>
>> On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu <dandreescu@wikimedia.org>
>> wrote:
>>>
>>> This (making pageviews proactive) is a great idea, and we should follow
>>> through.  Here's a simple start:
>>>
>>> If your app/site/etc. is creating a request that it wants to count as a
>>> pageview, add an X-Analytics header with pageview_id=<page_id> or
>>> pageview_title=<page_title>
>>>
>>> If we can make this change uniformly, I think we'd be in a very good
>>> place.
>>>
>>> On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <okeyes@wikimedia.org>
>>> wrote:
>>>>
>>>> On 19 August 2015 at 10:19, Andrew Otto <aotto@wikimedia.org> wrote:
>>>>>> If we /do/ include RESTBase requests we will not only have to
>>>>>> rewrite the pageview definition for the apps to recognise the new URL
>>>>>> scheme
>>>>>
>>>>> I really think that apps and APIs should do something proactive to tag
>>>>> or log a pageview.  With more ways of viewing content, it is going to get
>>>>> harder and harder to maintain a pattern based definition.  A pageview should
>>>>> be an event that is logged, not something that is pattern matched out of a
>>>>> very noisy stream of data.
>>>>>
>>>>> Most mediawiki requests do this now, via the page_id field in the
>>>>> X-Analytlics header, but we can’t use this for all pageviews because APIs
>>>>> are more complicated (e.g. more than one page can be served in a single
>>>>> request, etc.).  In the longterm, there should be a pageview event stream
>>>>> just like rcstream! :)
>>>>
>>>> This is an excellent point. IIRC we'd been asking Apps to do this for
>>>> kind of a while, so...
>>>>
>>>>>
>>>>> -Ao
>>>>>
>>>>>
>>>>>
>>>>>> On Aug 18, 2015, at 19:58, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>>>>>
>>>>>> On 18 August 2015 at 19:11, Bernd Sitzmann <bernd@wikimedia.org>
>>>>>> wrote:
>>>>>>> This discussion is about needed updates of the definition and
>>>>>>> Analytics
>>>>>>> implementation for mobile apps page view metrics. There is also an
>>>>>>> associated Phab task[4]. Please add the proper Analytics project
>>>>>>> there.
>>>>>>>
>>>>>>> Background / Changes
>>>>>>>
>>>>>>> As you probably remember, the Android app splits a page view into two
>>>>>>> requests: one for the lead section and metadata, plus another one for
>>>>>>> the
>>>>>>> remainder.
>>>>>>>
>>>>>>> The mobile apps are going to change the way they load pages in two
>>>>>>> different
>>>>>>> ways:
>>>>>>>
>>>>>>> We'll add a link preview when someone clicks on a link from a page.
>>>>>>> We're planning on switching over the using RESTBase for loading pages
>>>>>>> and
>>>>>>> also the link preview (initially just the Android beta, ater more)
>>>>>>>
>>>>>>
>>>>>> Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful
>>>>>> service API?
>>>>>>
>>>>>> Last time I checked that wasn't even consumed by HDFS. Is it now being
>>>>>> consumed by HDFS?
>>>>>>
>>>>>> More importantly the actual URLs are going to look /totally/
>>>>>> different. If we do not include RESTBase requests, we will miss the
>>>>>> apps. If we /do/ include RESTBase requests we will not only have to
>>>>>> rewrite the pageview definition for the apps to recognise the new URL
>>>>>> scheme, we will also potentially have to rewrite every /other/ bit of
>>>>>> the definition to /not/ incorporate those requests.
>>>>>>
>>>>>> (I use "we" in a collective sense. This isn't my baby any more,
>>>>>> although if Joseph et al want help with the refactor here I'm happy to
>>>>>> spend my volunteer time on it).
>>>>>>
>>>>>> But basically every other bit of your email is important but now
>>>>>> secondary: this is a potentially massive change, all on its own, even
>>>>>> without the link preview, even if the substance of the requests going
>>>>>> to RESTBase were identical.
>>>>>>
>>>>>>> This will have implications for the pageviews definition and how we
>>>>>>> count
>>>>>>> user engagement.
>>>>>>>
>>>>>>> The big question is
>>>>>>>
>>>>>>> Should we count link previews as a page view since it's an indication
>>>>>>> of
>>>>>>> user engagement? Or should there be a separate metric for link
>>>>>>> previews?
>>>>>>>
>>>>>>> Counting page views
>>>>>>>
>>>>>>> IIRC we currently count action=mobileview&sections=0 query parameters
>>>>>>> of
>>>>>>> api.php as a page view. When we publish link previews for all Android
>>>>>>> app
>>>>>>> users then we would either want to count also the calls to
>>>>>>> action=query&prop=extracts as a page view or add them to another
>>>>>>> metric.
>>>>>>>
>>>>>>> Once the apps use RESTBase the HTTPS requests will be very different:
>>>>>>>
>>>>>>> Page view: Instead of action=mobileview&sections=0 the app would call
>>>>>>> the
>>>>>>> RESTBase endpoint for lead request[1] instead of the PHP API
>>>>>>> mentioned
>>>>>>> above. Then it would call [2].
>>>>>>> Link preview: Instead of action=query&prop=extracts it would call the
>>>>>>> lead
>>>>>>> request[1], too, since there is a lot of overlap. At least that our
>>>>>>> current
>>>>>>> plan. The advantage of that is that the client doesn't need to
>>>>>>> execute the
>>>>>>> lead request a second time if the user clicks on the link preview (--
>>>>>>> either
>>>>>>> through caching or app logic.)
>>>>>>>
>>>>>>> So, in the RESTBase case we either want to count the
>>>>>>> mobile-html-sections-lead requests or the
>>>>>>> mobile-html-sections-remaining
>>>>>>> requests depending on what our definition for page views actually is.
>>>>>>> We
>>>>>>> could also add a query parameter or extra HTTP header to one of the
>>>>>>> mobile-html-sections-lead requests if we need to distinguish between
>>>>>>> previews and page views.
>>>>>>>
>>>>>>> Both the current PHP API and the RESTBase based metrics would need to
>>>>>>> be
>>>>>>> compatible and be collected in parallel since we cannot control when
>>>>>>> users
>>>>>>> update their apps.
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
>>>>>>> [2]
>>>>>>>
>>>>>>> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert
>>>>>>> [3]
>>>>>>>
>>>>>>> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps
>>>>>>>
>>>>>>> [4] https://phabricator.wikimedia.org/T109383
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Bernd
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Analytics mailing list
>>>>>>> Analytics@lists.wikimedia.org
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Oliver Keyes
>>>>>> Count Logula
>>>>>> Wikimedia Foundation
>>>>>>
>>>>>> _______________________________________________
>>>>>> Analytics mailing list
>>>>>> Analytics@lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>>
>>>> --
>>>> Oliver Keyes
>>>> Count Logula
>>>> Wikimedia Foundation
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>>
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics


_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics