I think if we do this right, we should prefer page_id, but use page_title if it is
provided.
However, at the moment we don’t have a good way of actually getting page_title in Hadoop
from the MW DBs even if given a page_id. We’d still have to infer the title from the URI.
I’d prefer if page_id was the canonical way of identifying a page view, but currently
page_title is used in all pageview statistics. Using the page_title as the generator of
the request sees it might be even more correct than inferring it from the URI. Or, maybe
it would be better (for the moment) to use use the existence of page_id or page_title to
indicate to the pageview definition logic that this request is definitely already a
pageview, and then use the same page title from URI logic on all requests no matter what.
page_id or page_title would just allow the pageview definition pattern matching logic to
be skipped, as we would know right up front that a request is a pageview.
Are you saying the apps have the option to skip
providing one of page_title or page_id?
So uhhh, yes! I think, although I am not
the authority on this. I defer to other analytics engineers who will actually have to
implement and maintain this change :)
> On Aug 19, 2015, at 12:29, Bernd Sitzmann <bernd(a)wikimedia.org> wrote:
>
> Andrew,
>
Are you saying the apps have the option to skip
providing one of page_title or page_id?
> I hope this is the case since I just
came up with a scheme where we could avoid the second request when a page has only a
single section, which we already get through the first (lead) request.
>
> Yes to what Oliver said: The apps don't always know the page_id ahead of time
(only sometimes). The best example where we don't know the page_id ahead of time is
when someone searches for a term on Google search on an Android device, and gets directed
to our Android app. The app only gets the URL of the page, which we then take to derive
the wiki and page_title from.
>
> Bernd
>
> On Wed, Aug 19, 2015 at 10:24 AM, Oliver Keyes <okeyes(a)wikimedia.org
<mailto:okeyes@wikimedia.org>> wrote:
> It'll need to be, some requests don't know pageID in advance, which I
> think was the reason Apps initially didn't implement this.
>
> On 19 August 2015 at 12:19, Andrew Otto <aotto(a)wikimedia.org
<mailto:aotto@wikimedia.org>> wrote:
> > If your app/site/etc. is creating a request that it wants to count as a
> > pageview, add an X-Analytics header with pageview_id=<page_id> or
> > pageview_title=<page_title>
> >
> >
> > page_id is the current key, so let’s keep that. page_title would be good to
> > have too. Let’s make it an and/or.
> >
> >
> > On Aug 19, 2015, at 12:17, Bernd Sitzmann <bernd(a)wikimedia.org
<mailto:bernd@wikimedia.org>> wrote:
> >
> >> If your app/site/etc. is creating a request that it wants to count as a
> >> pageview, add an X-Analytics header with pageview_id=<page_id> or
> >> pageview_title=<page_title>
> >
> >
> > Ideally the page id would be the way to go. From a client's perspective I
> > prefer the page title since clients don't always know the page id ahead of
> > time. (We could put that header into the second request of loading the page
> > but I cannot guarantee that we we will always have a second request in the
> > future.)
> >
> > --Cheers,
> > Bernd
> >
> > On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu <dandreescu(a)wikimedia.org
<mailto:dandreescu@wikimedia.org>>
> > wrote:
> >>
> >> This (making pageviews proactive) is a great idea, and we should follow
> >> through. Here's a simple start:
> >>
> >> If your app/site/etc. is creating a request that it wants to count as a
> >> pageview, add an X-Analytics header with pageview_id=<page_id> or
> >> pageview_title=<page_title>
> >>
> >> If we can make this change uniformly, I think we'd be in a very good
> >> place.
> >>
> >> On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <okeyes(a)wikimedia.org
<mailto:okeyes@wikimedia.org>>
> >> wrote:
> >>>
> >>> On 19 August 2015 at 10:19, Andrew Otto <aotto(a)wikimedia.org
<mailto:aotto@wikimedia.org>> wrote:
> >>> >> If we /do/ include RESTBase requests we will not only have to
> >>> >> rewrite the pageview definition for the apps to recognise the
new URL
> >>> >> scheme
> >>> >
> >>> > I really think that apps and APIs should do something proactive to
tag
> >>> > or log a pageview. With more ways of viewing content, it is going
to get
> >>> > harder and harder to maintain a pattern based definition. A
pageview should
> >>> > be an event that is logged, not something that is pattern matched
out of a
> >>> > very noisy stream of data.
> >>> >
> >>> > Most mediawiki requests do this now, via the page_id field in the
> >>> > X-Analytlics header, but we can’t use this for all pageviews
because APIs
> >>> > are more complicated (e.g. more than one page can be served in a
single
> >>> > request, etc.). In the longterm, there should be a pageview event
stream
> >>> > just like rcstream! :)
> >>>
> >>> This is an excellent point. IIRC we'd been asking Apps to do this
for
> >>> kind of a while, so...
> >>>
> >>> >
> >>> > -Ao
> >>> >
> >>> >
> >>> >
> >>> >> On Aug 18, 2015, at 19:58, Oliver Keyes
<okeyes(a)wikimedia.org <mailto:okeyes@wikimedia.org>> wrote:
> >>> >>
> >>> >> On 18 August 2015 at 19:11, Bernd Sitzmann
<bernd(a)wikimedia.org <mailto:bernd@wikimedia.org>>
> >>> >> wrote:
> >>> >>> This discussion is about needed updates of the definition
and
> >>> >>> Analytics
> >>> >>> implementation for mobile apps page view metrics. There is
also an
> >>> >>> associated Phab task[4]. Please add the proper Analytics
project
> >>> >>> there.
> >>> >>>
> >>> >>> Background / Changes
> >>> >>>
> >>> >>> As you probably remember, the Android app splits a page
view into two
> >>> >>> requests: one for the lead section and metadata, plus
another one for
> >>> >>> the
> >>> >>> remainder.
> >>> >>>
> >>> >>> The mobile apps are going to change the way they load pages
in two
> >>> >>> different
> >>> >>> ways:
> >>> >>>
> >>> >>> We'll add a link preview when someone clicks on a link
from a page.
> >>> >>> We're planning on switching over the using RESTBase for
loading pages
> >>> >>> and
> >>> >>> also the link preview (initially just the Android beta,
ater more)
> >>> >>>
> >>> >>
> >>> >> Woah woah woah woah woah. By RESTBase do you mean Gabriel's
RESTful
> >>> >> service API?
> >>> >>
> >>> >> Last time I checked that wasn't even consumed by HDFS. Is
it now being
> >>> >> consumed by HDFS?
> >>> >>
> >>> >> More importantly the actual URLs are going to look /totally/
> >>> >> different. If we do not include RESTBase requests, we will miss
the
> >>> >> apps. If we /do/ include RESTBase requests we will not only
have to
> >>> >> rewrite the pageview definition for the apps to recognise the
new URL
> >>> >> scheme, we will also potentially have to rewrite every /other/
bit of
> >>> >> the definition to /not/ incorporate those requests.
> >>> >>
> >>> >> (I use "we" in a collective sense. This isn't my
baby any more,
> >>> >> although if Joseph et al want help with the refactor here
I'm happy to
> >>> >> spend my volunteer time on it).
> >>> >>
> >>> >> But basically every other bit of your email is important but
now
> >>> >> secondary: this is a potentially massive change, all on its
own, even
> >>> >> without the link preview, even if the substance of the requests
going
> >>> >> to RESTBase were identical.
> >>> >>
> >>> >>> This will have implications for the pageviews definition
and how we
> >>> >>> count
> >>> >>> user engagement.
> >>> >>>
> >>> >>> The big question is
> >>> >>>
> >>> >>> Should we count link previews as a page view since it's
an indication
> >>> >>> of
> >>> >>> user engagement? Or should there be a separate metric for
link
> >>> >>> previews?
> >>> >>>
> >>> >>> Counting page views
> >>> >>>
> >>> >>> IIRC we currently count action=mobileview§ions=0
query parameters
> >>> >>> of
> >>> >>> api.php as a page view. When we publish link previews for
all Android
> >>> >>> app
> >>> >>> users then we would either want to count also the calls to
> >>> >>> action=query&prop=extracts as a page view or add them
to another
> >>> >>> metric.
> >>> >>>
> >>> >>> Once the apps use RESTBase the HTTPS requests will be very
different:
> >>> >>>
> >>> >>> Page view: Instead of action=mobileview§ions=0 the
app would call
> >>> >>> the
> >>> >>> RESTBase endpoint for lead request[1] instead of the PHP
API
> >>> >>> mentioned
> >>> >>> above. Then it would call [2].
> >>> >>> Link preview: Instead of action=query&prop=extracts it
would call the
> >>> >>> lead
> >>> >>> request[1], too, since there is a lot of overlap. At least
that our
> >>> >>> current
> >>> >>> plan. The advantage of that is that the client doesn't
need to
> >>> >>> execute the
> >>> >>> lead request a second time if the user clicks on the link
preview (--
> >>> >>> either
> >>> >>> through caching or app logic.)
> >>> >>>
> >>> >>> So, in the RESTBase case we either want to count the
> >>> >>> mobile-html-sections-lead requests or the
> >>> >>> mobile-html-sections-remaining
> >>> >>> requests depending on what our definition for page views
actually is.
> >>> >>> We
> >>> >>> could also add a query parameter or extra HTTP header to
one of the
> >>> >>> mobile-html-sections-lead requests if we need to
distinguish between
> >>> >>> previews and page views.
> >>> >>>
> >>> >>> Both the current PHP API and the RESTBase based metrics
would need to
> >>> >>> be
> >>> >>> compatible and be collected in parallel since we cannot
control when
> >>> >>> users
> >>> >>> update their apps.
> >>> >>>
> >>> >>> [1]
> >>> >>>
> >>> >>>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
<https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert>
> >>> >>> [2]
> >>> >>>
> >>> >>>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Di…
<https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert>
> >>> >>> [3]
> >>> >>>
> >>> >>>
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_ap…
<https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps>
> >>> >>>
> >>> >>> [4]
https://phabricator.wikimedia.org/T109383
<https://phabricator.wikimedia.org/T109383>
> >>> >>>
> >>> >>>
> >>> >>> Cheers,
> >>> >>>
> >>> >>> Bernd
> >>> >>>
> >>> >>>
> >>> >>> _______________________________________________
> >>> >>> Analytics mailing list
> >>> >>> Analytics(a)lists.wikimedia.org
<mailto:Analytics@lists.wikimedia.org>
> >>> >>>
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
> >>> >>>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Oliver Keyes
> >>> >> Count Logula
> >>> >> Wikimedia Foundation
> >>> >>
> >>> >> _______________________________________________
> >>> >> Analytics mailing list
> >>> >> Analytics(a)lists.wikimedia.org
<mailto:Analytics@lists.wikimedia.org>
> >>> >>
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > Analytics mailing list
> >>> > Analytics(a)lists.wikimedia.org
<mailto:Analytics@lists.wikimedia.org>
> >>> >
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
> >>>
> >>>
> >>>
> >>> --
> >>> Oliver Keyes
> >>> Count Logula
> >>> Wikimedia Foundation
> >>>
> >>> _______________________________________________
> >>> Analytics mailing list
> >>> Analytics(a)lists.wikimedia.org
<mailto:Analytics@lists.wikimedia.org>
> >>>
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
> >>
> >>
> >>
> >> _______________________________________________
> >> Analytics mailing list
> >> Analytics(a)lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>
> >>
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
> >>
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics(a)lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>
> >
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
> >
> >
> >
> > _______________________________________________
> > Analytics mailing list
> > Analytics(a)lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>
> >
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
> >
>
>
>
> --
> Oliver Keyes
> Count Logula
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org>
>
https://lists.wikimedia.org/mailman/listinfo/analytics
<https://lists.wikimedia.org/mailman/listinfo/analytics>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics