In some cases we have page_id, in other cases we have nothing (like API
requests)
On Wed, Aug 19, 2015 at 2:13 PM, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
Aren't we currently just storing pageID?
On 19 August 2015 at 14:11, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
Oliver, the problem with "page_title OR
page_id" instead of "always
page_title and page_id if you have it" is what Andrew was addressing
above.
It means we have to query for page_title by id,
and that means we need to
keep an up-to-date copy of all mediawiki databases. And we have to be
able
to query that copy tens of thousands of times per
second, which is
basically
not going to happen.
We just chatted in scrum of scrums about this, it looks like Adam's
going to
set up a meeting so we can talk more there. I
agree with Adam that we
have
to have a short term solution for counting the
new kinds of requests. A
medium term solution so that we don't all go insane, and something to
shoot
for in the long term.
On Wed, Aug 19, 2015 at 1:48 PM, Oliver Keyes <okeyes(a)wikimedia.org>
wrote:
>
> In the absence of all clients doing it, "if it has this x_analytics
> entry, don't bother with the complex regular expressions, if it
> doesn't, do" still works.
>
> On 19 August 2015 at 13:34, Gabriel Wicke <gwicke(a)wikimedia.org> wrote:
> > Yeah, doing this on the client could work, but would require *all*
> > clients
> > to actually do it. We also have metrics per entry point in RESTBase,
but
> > those are behind Varnishes and will only
count Varnish cache misses.
> > Without
> > Varnish caching, this would be a solved problem ;)
> >
> > On Wed, Aug 19, 2015 at 7:53 AM, Dan Andreescu
> > <dandreescu(a)wikimedia.org>
> > wrote:
> >>
> >> This (making pageviews proactive) is a great idea, and we should
follow
> >> through. Here's a simple
start:
> >>
> >> If your app/site/etc. is creating a request that it wants to count
as
a
>>
pageview, add an X-Analytics header with pageview_id=<page_id> or
>> pageview_title=<page_title>
>>
>> If we can make this change uniformly, I think we'd be in a very good
>> place.
>>
>> On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <okeyes(a)wikimedia.org
> >> wrote:
> >>>
> >>> On 19 August 2015 at 10:19, Andrew Otto <aotto(a)wikimedia.org>
wrote:
> >>> >> If we /do/ include
RESTBase requests we will not only have to
> >>> >> rewrite the pageview definition for the apps to recognise the
new
> >>> >> URL
> >>> >> scheme
> >>> >
> >>> > I really think that apps and APIs should do something proactive to
> >>> > tag
> >>> > or log a pageview. With more ways of viewing content, it is going
> >>> > to get
> >>> > harder and harder to maintain a pattern based definition. A
> >>> > pageview should
> >>> > be an event that is logged, not something that is pattern matched
> >>> > out of a
> >>> > very noisy stream of data.
> >>> >
> >>> > Most mediawiki requests do this now, via the page_id field in the
> >>> > X-Analytlics header, but we can’t use this for all pageviews
because
> >>> > APIs
> >>> > are more complicated (e.g. more than one page can be served in a
> >>> > single
> >>> > request, etc.). In the longterm, there should be a pageview event
> >>> > stream
> >>> > just like rcstream! :)
> >>>
> >>> This is an excellent point. IIRC we'd been asking Apps to do this
for
> >>> kind of a while, so...
> >>>
> >>> >
> >>> > -Ao
> >>> >
> >>> >
> >>> >
> >>> >> On Aug 18, 2015, at 19:58, Oliver Keyes
<okeyes(a)wikimedia.org>
> >>> >> wrote:
> >>> >>
> >>> >> On 18 August 2015 at 19:11, Bernd Sitzmann
<bernd(a)wikimedia.org>
> >>> >> wrote:
> >>> >>> This discussion is about needed updates of the definition
and
> >>> >>> Analytics
> >>> >>> implementation for mobile apps page view metrics. There is
also
an
> >>> >>> associated Phab
task[4]. Please add the proper Analytics project
> >>> >>> there.
> >>> >>>
> >>> >>> Background / Changes
> >>> >>>
> >>> >>> As you probably remember, the Android app splits a page
view
into
> >>> >>> two
> >>> >>> requests: one for the lead section and metadata, plus
another
one
> >>> >>> for
> >>> >>> the
> >>> >>> remainder.
> >>> >>>
> >>> >>> The mobile apps are going to change the way they load pages
in
two
> >>> >>> different
> >>> >>> ways:
> >>> >>>
> >>> >>> We'll add a link preview when someone clicks on a link
from a
> >>> >>> page.
> >>> >>> We're planning on switching over the using RESTBase for
loading
> >>> >>> pages
> >>> >>> and
> >>> >>> also the link preview (initially just the Android beta,
ater
more)
> >>> >>>
> >>> >>
> >>> >> Woah woah woah woah woah. By RESTBase do you mean Gabriel's
RESTful
> >>> >> service API?
> >>> >>
> >>> >> Last time I checked that wasn't even consumed by HDFS. Is
it now
> >>> >> being
> >>> >> consumed by HDFS?
> >>> >>
> >>> >> More importantly the actual URLs are going to look /totally/
> >>> >> different. If we do not include RESTBase requests, we will miss
the
> >>> >> apps. If we /do/
include RESTBase requests we will not only have
to
> >>> >> rewrite the pageview
definition for the apps to recognise the new
> >>> >> URL
> >>> >> scheme, we will also potentially have to rewrite every /other/
bit
> >>> >> of
> >>> >> the definition to /not/ incorporate those requests.
> >>> >>
> >>> >> (I use "we" in a collective sense. This isn't my
baby any more,
> >>> >> although if Joseph et al want help with the refactor here
I'm
happy
> >>> >> to
> >>> >> spend my volunteer time on it).
> >>> >>
> >>> >> But basically every other bit of your email is important but
now
> >>> >> secondary: this is a potentially massive change, all on its
own,
> >>> >> even
> >>> >> without the link preview, even if the substance of the
requests
> >>> >> going
> >>> >> to RESTBase were identical.
> >>> >>
> >>> >>> This will have implications for the pageviews definition
and how
> >>> >>> we
> >>> >>> count
> >>> >>> user engagement.
> >>> >>>
> >>> >>> The big question is
> >>> >>>
> >>> >>> Should we count link previews as a page view since it's
an
> >>> >>> indication
> >>> >>> of
> >>> >>> user engagement? Or should there be a separate metric for
link
> >>> >>> previews?
> >>> >>>
> >>> >>> Counting page views
> >>> >>>
> >>> >>> IIRC we currently count action=mobileview§ions=0
query
> >>> >>> parameters
> >>> >>> of
> >>> >>> api.php as a page view. When we publish link previews for
all
> >>> >>> Android
> >>> >>> app
> >>> >>> users then we would either want to count also the calls to
> >>> >>> action=query&prop=extracts as a page view or add them
to another
> >>> >>> metric.
> >>> >>>
> >>> >>> Once the apps use RESTBase the HTTPS requests will be very
> >>> >>> different:
> >>> >>>
> >>> >>> Page view: Instead of action=mobileview§ions=0 the
app would
> >>> >>> call
> >>> >>> the
> >>> >>> RESTBase endpoint for lead request[1] instead of the PHP
API
> >>> >>> mentioned
> >>> >>> above. Then it would call [2].
> >>> >>> Link preview: Instead of action=query&prop=extracts it
would
call
> >>> >>> the
> >>> >>> lead
> >>> >>> request[1], too, since there is a lot of overlap. At least
that
> >>> >>> our
> >>> >>> current
> >>> >>> plan. The advantage of that is that the client doesn't
need to
> >>> >>> execute the
> >>> >>> lead request a second time if the user clicks on the link
preview
> >>> >>> (--
> >>> >>> either
> >>> >>> through caching or app logic.)
> >>> >>>
> >>> >>> So, in the RESTBase case we either want to count the
> >>> >>> mobile-html-sections-lead requests or the
> >>> >>> mobile-html-sections-remaining
> >>> >>> requests depending on what our definition for page views
actually
> >>> >>> is.
> >>> >>> We
> >>> >>> could also add a query parameter or extra HTTP header to
one of
> >>> >>> the
> >>> >>> mobile-html-sections-lead requests if we need to
distinguish
> >>> >>> between
> >>> >>> previews and page views.
> >>> >>>
> >>> >>> Both the current PHP API and the RESTBase based metrics
would
need
> >>> >>> to
> >>> >>> be
> >>> >>> compatible and be collected in parallel since we cannot
control
> >>> >>> when
> >>> >>> users
> >>> >>> update their apps.
> >>> >>>
> >>> >>> [1]
> >>> >>>
> >>> >>>
> >>> >>>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
> >>> >>> [2]
> >>> >>>
> >>> >>>
> >>> >>>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Di…
> >>> >>> [3]
> >>> >>>
> >>> >>>
> >>> >>>
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_ap…
>> >>>
>> >>> [4]
https://phabricator.wikimedia.org/T109383
>> >>>
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Bernd
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> Analytics mailing list
>> >>> Analytics(a)lists.wikimedia.org
>> >>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Oliver Keyes
>> >> Count Logula
>> >> Wikimedia Foundation
>> >>
>> >> _______________________________________________
>> >> Analytics mailing list
>> >> Analytics(a)lists.wikimedia.org
>> >>
https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics(a)lists.wikimedia.org
>> >
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>> --
>> Oliver Keyes
>> Count Logula
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics
>
--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Oliver Keyes
Count Logula
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Oliver Keyes
Count Logula
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics