Oliver, the problem with "page_title OR page_id" instead of "always page_title and page_id if you have it" is what Andrew was addressing above. It means we have to query for page_title by id, and that means we need to keep an up-to-date copy of all mediawiki databases. And we have to be able to query that copy tens of thousands of times per second, which is basically not going to happen.
We just chatted in scrum of scrums about this, it looks like Adam's going to set up a meeting so we can talk more there. I agree with Adam that we have to have a short term solution for counting the new kinds of requests. A medium term solution so that we don't all go insane, and something to shoot for in the long term.
On Wed, Aug 19, 2015 at 1:48 PM, Oliver Keyes okeyes@wikimedia.org wrote:
In the absence of all clients doing it, "if it has this x_analytics entry, don't bother with the complex regular expressions, if it doesn't, do" still works.
On 19 August 2015 at 13:34, Gabriel Wicke gwicke@wikimedia.org wrote:
Yeah, doing this on the client could work, but would require *all*
clients
to actually do it. We also have metrics per entry point in RESTBase, but those are behind Varnishes and will only count Varnish cache misses.
Without
Varnish caching, this would be a solved problem ;)
On Wed, Aug 19, 2015 at 7:53 AM, Dan Andreescu <dandreescu@wikimedia.org
wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new
URL
scheme
I really think that apps and APIs should do something proactive to
tag
or log a pageview. With more ways of viewing content, it is going
to get
harder and harder to maintain a pattern based definition. A
pageview should
be an event that is logged, not something that is pattern matched
out of a
very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because
APIs
are more complicated (e.g. more than one page can be served in a
single
request, etc.). In the longterm, there should be a pageview event
stream
just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org
wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote: > This discussion is about needed updates of the definition and > Analytics > implementation for mobile apps page view metrics. There is also an > associated Phab task[4]. Please add the proper Analytics project > there. > > Background / Changes > > As you probably remember, the Android app splits a page view into
two
> requests: one for the lead section and metadata, plus another one
for
> the > remainder. > > The mobile apps are going to change the way they load pages in two > different > ways: > > We'll add a link preview when someone clicks on a link from a page. > We're planning on switching over the using RESTBase for loading
pages
> and > also the link preview (initially just the Android beta, ater more) >
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now
being
consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new
URL
scheme, we will also potentially have to rewrite every /other/ bit
of
the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy
to
spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own,
even
without the link preview, even if the substance of the requests
going
to RESTBase were identical.
> This will have implications for the pageviews definition and how we > count > user engagement. > > The big question is > > Should we count link previews as a page view since it's an
indication
> of > user engagement? Or should there be a separate metric for link > previews? > > Counting page views > > IIRC we currently count action=mobileview§ions=0 query
parameters
> of > api.php as a page view. When we publish link previews for all
Android
> app > users then we would either want to count also the calls to > action=query&prop=extracts as a page view or add them to another > metric. > > Once the apps use RESTBase the HTTPS requests will be very
different:
> > Page view: Instead of action=mobileview§ions=0 the app would
call
> the > RESTBase endpoint for lead request[1] instead of the PHP API > mentioned > above. Then it would call [2]. > Link preview: Instead of action=query&prop=extracts it would call
the
> lead > request[1], too, since there is a lot of overlap. At least that our > current > plan. The advantage of that is that the client doesn't need to > execute the > lead request a second time if the user clicks on the link preview
(--
> either > through caching or app logic.) > > So, in the RESTBase case we either want to count the > mobile-html-sections-lead requests or the > mobile-html-sections-remaining > requests depending on what our definition for page views actually
is.
> We > could also add a query parameter or extra HTTP header to one of the > mobile-html-sections-lead requests if we need to distinguish
between
> previews and page views. > > Both the current PHP API and the RESTBase based metrics would need
to
> be > compatible and be collected in parallel since we cannot control
when
> users > update their apps. > > [1] > >
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
> [2] > >
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
> [3] > >
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
> > [4] https://phabricator.wikimedia.org/T109383 > > > Cheers, > > Bernd > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics