This discussion also reminds me of the idea of tracking time spent on site. Arguably, that's a more relevant measurement for how much of our content people actually consume, and it also neatly side-steps issues like the categorization of link previews. I realize that measuring that accurately can be challenging, but I think it'll become more and more important as we venture into more dynamic content experiences.


On Thu, Sep 17, 2015 at 8:17 AM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Danke!

On 17 September 2015 at 11:15, Nuria Ruiz <nuria@wikimedia.org> wrote:
> Right! Thanks for pointing that out.
>
> I think I have updated all docs now:
> https://meta.wikimedia.org/wiki/Research:Page_view#Change_log
>
> https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
>
> On Thu, Sep 17, 2015 at 7:36 AM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>
>> Have those changes been noted on the main pageview definition page and
>> associated changelog?
>>
>> On 17 September 2015 at 09:58, Nuria Ruiz <nuria@wikimedia.org> wrote:
>> >>With more ways of viewing content, it is going to get harder and harder
>> >> to
>> >> maintain a pattern based definition.
>> > Indeed, we want to move away from pattern based definition as mach as
>> > possible.
>> >
>> > This is an FYI to everyone that with our latest changes (that we are in
>> > the
>> > process of deploying today) if a request comes "tagged" with "preview"
>> > in
>> > the x-analytics header it will not be counted towards a pageviews. The
>> > Android App should do corresponding changes to add the tag "preview" to
>> > its
>> > preview requests.
>> >
>> > X-analytics header is documented here:
>> > https://wikitech.wikimedia.org/wiki/X-Analytics
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Aug 19, 2015 at 7:19 AM, Andrew Otto <aotto@wikimedia.org>
>> > wrote:
>> >>
>> >> >  If we /do/ include RESTBase requests we will not only have to
>> >> > rewrite the pageview definition for the apps to recognise the new URL
>> >> > scheme
>> >>
>> >> I really think that apps and APIs should do something proactive to tag
>> >> or
>> >> log a pageview.  With more ways of viewing content, it is going to get
>> >> harder and harder to maintain a pattern based definition.  A pageview
>> >> should
>> >> be an event that is logged, not something that is pattern matched out
>> >> of a
>> >> very noisy stream of data.
>> >>
>> >> Most mediawiki requests do this now, via the page_id field in the
>> >> X-Analytlics header, but we can’t use this for all pageviews because
>> >> APIs
>> >> are more complicated (e.g. more than one page can be served in a single
>> >> request, etc.).  In the longterm, there should be a pageview event
>> >> stream
>> >> just like rcstream! :)
>> >>
>> >> -Ao
>> >>
>> >>
>> >>
>> >> > On Aug 18, 2015, at 19:58, Oliver Keyes <okeyes@wikimedia.org> wrote:
>> >> >
>> >> > On 18 August 2015 at 19:11, Bernd Sitzmann <bernd@wikimedia.org>
>> >> > wrote:
>> >> >> This discussion is about needed updates of the definition and
>> >> >> Analytics
>> >> >> implementation for mobile apps page view metrics. There is also an
>> >> >> associated Phab task[4]. Please add the proper Analytics project
>> >> >> there.
>> >> >>
>> >> >> Background / Changes
>> >> >>
>> >> >> As you probably remember, the Android app splits a page view into
>> >> >> two
>> >> >> requests: one for the lead section and metadata, plus another one
>> >> >> for
>> >> >> the
>> >> >> remainder.
>> >> >>
>> >> >> The mobile apps are going to change the way they load pages in two
>> >> >> different
>> >> >> ways:
>> >> >>
>> >> >> We'll add a link preview when someone clicks on a link from a page.
>> >> >> We're planning on switching over the using RESTBase for loading
>> >> >> pages
>> >> >> and
>> >> >> also the link preview (initially just the Android beta, ater more)
>> >> >>
>> >> >
>> >> > Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful
>> >> > service API?
>> >> >
>> >> > Last time I checked that wasn't even consumed by HDFS. Is it now
>> >> > being
>> >> > consumed by HDFS?
>> >> >
>> >> > More importantly the actual URLs are going to look /totally/
>> >> > different. If we do not include RESTBase requests, we will miss the
>> >> > apps. If we /do/ include RESTBase requests we will not only have to
>> >> > rewrite the pageview definition for the apps to recognise the new URL
>> >> > scheme, we will also potentially have to rewrite every /other/ bit of
>> >> > the definition to /not/ incorporate those requests.
>> >> >
>> >> > (I use "we" in a collective sense. This isn't my baby any more,
>> >> > although if Joseph et al want help with the refactor here I'm happy
>> >> > to
>> >> > spend my volunteer time on it).
>> >> >
>> >> > But basically every other bit of your email is important but now
>> >> > secondary: this is a potentially massive change, all on its own, even
>> >> > without the link preview, even if the substance of the requests going
>> >> > to RESTBase were identical.
>> >> >
>> >> >> This will have implications for the pageviews definition and how we
>> >> >> count
>> >> >> user engagement.
>> >> >>
>> >> >> The big question is
>> >> >>
>> >> >> Should we count link previews as a page view since it's an
>> >> >> indication
>> >> >> of
>> >> >> user engagement? Or should there be a separate metric for link
>> >> >> previews?
>> >> >>
>> >> >> Counting page views
>> >> >>
>> >> >> IIRC we currently count action=mobileview&sections=0 query
>> >> >> parameters
>> >> >> of
>> >> >> api.php as a page view. When we publish link previews for all
>> >> >> Android
>> >> >> app
>> >> >> users then we would either want to count also the calls to
>> >> >> action=query&prop=extracts as a page view or add them to another
>> >> >> metric.
>> >> >>
>> >> >> Once the apps use RESTBase the HTTPS requests will be very
>> >> >> different:
>> >> >>
>> >> >> Page view: Instead of action=mobileview&sections=0 the app would
>> >> >> call
>> >> >> the
>> >> >> RESTBase endpoint for lead request[1] instead of the PHP API
>> >> >> mentioned
>> >> >> above. Then it would call [2].
>> >> >> Link preview: Instead of action=query&prop=extracts it would call
>> >> >> the
>> >> >> lead
>> >> >> request[1], too, since there is a lot of overlap. At least that our
>> >> >> current
>> >> >> plan. The advantage of that is that the client doesn't need to
>> >> >> execute
>> >> >> the
>> >> >> lead request a second time if the user clicks on the link preview
>> >> >> (--
>> >> >> either
>> >> >> through caching or app logic.)
>> >> >>
>> >> >> So, in the RESTBase case we either want to count the
>> >> >> mobile-html-sections-lead requests or the
>> >> >> mobile-html-sections-remaining
>> >> >> requests depending on what our definition for page views actually
>> >> >> is.
>> >> >> We
>> >> >> could also add a query parameter or extra HTTP header to one of the
>> >> >> mobile-html-sections-lead requests if we need to distinguish between
>> >> >> previews and page views.
>> >> >>
>> >> >> Both the current PHP API and the RESTBase based metrics would need
>> >> >> to
>> >> >> be
>> >> >> compatible and be collected in parallel since we cannot control when
>> >> >> users
>> >> >> update their apps.
>> >> >>
>> >> >> [1]
>> >> >>
>> >> >>
>> >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
>> >> >> [2]
>> >> >>
>> >> >>
>> >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert
>> >> >> [3]
>> >> >>
>> >> >>
>> >> >> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps
>> >> >>
>> >> >> [4] https://phabricator.wikimedia.org/T109383
>> >> >>
>> >> >>
>> >> >> Cheers,
>> >> >>
>> >> >> Bernd
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Analytics mailing list
>> >> >> Analytics@lists.wikimedia.org
>> >> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Oliver Keyes
>> >> > Count Logula
>> >> > Wikimedia Foundation
>> >> >
>> >> > _______________________________________________
>> >> > Analytics mailing list
>> >> > Analytics@lists.wikimedia.org
>> >> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>
>> >>
>> >> _______________________________________________
>> >> Analytics mailing list
>> >> Analytics@lists.wikimedia.org
>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Count Logula
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Count Logula
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation