This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there. Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
1. We'll add a link preview when someone clicks on a link from a page. 2. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
This will have implications for the pageviews definition and how we count user engagement. The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews? Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
- Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. - Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... [3] https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... [3] https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
We briefly considered counting views of Hover Cards as Pageviews, but it was quickly dismissed. First, the feature is not widely used enough to justify Changing the pageview definition.
I'm still open to counting previews as pageviews, but I think the Readership team and their product managers need to weigh in heavily as Pageviews is a key metric for them.
Finally, counting Pageviews served through RESTBase sounds like a new project and I'd like to hear more about the effort needed from the analytics engineers.
On Tue, Aug 18, 2015 at 4:58 PM, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two
different
ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the
lead
request[1], too, since there is a lot of overlap. At least that our
current
plan. The advantage of that is that the client doesn't need to execute
the
lead request a second time if the user clicks on the link preview (--
either
through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when
users
update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
[2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
[3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
RESTBase is behind regular text Varnishes, so I suspect that the logs might already end up in HDFS. All entry points start with /api/rest_v1/, which shouldn't overlap with potential page views counted at /w/api.php or /wiki/.
On Tue, Aug 18, 2015 at 6:32 PM, Kevin Leduc kevin@wikimedia.org wrote:
We briefly considered counting views of Hover Cards as Pageviews, but it was quickly dismissed. First, the feature is not widely used enough to justify Changing the pageview definition.
I'm still open to counting previews as pageviews, but I think the Readership team and their product managers need to weigh in heavily as Pageviews is a key metric for them.
Finally, counting Pageviews served through RESTBase sounds like a new project and I'd like to hear more about the effort needed from the analytics engineers.
This reminds me.. right now we don't allow Varnishes to cache any content, but we plan to start allowing this soon. At that point, internal RESTBase metrics like http://grafana.wikimedia.org/#/dashboard/db/restbase?panelId=8&fullscree... will only show the cache misses. For our purposes it would be super useful to keep track of total requests matching /api/rest_v1/. This will let us track overall API usage, which is going to be our primary KPI for now. I have created a ticket for this at https://phabricator.wikimedia.org/T109547.
Thanks!
Gabriel
On Tue, Aug 18, 2015 at 4:58 PM, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for
the
remainder.
The mobile apps are going to change the way they load pages in two
different
ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages
and
also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we
count
user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android
app
users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call
the
RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the
lead
request[1], too, since there is a lot of overlap. At least that our
current
plan. The advantage of that is that the client doesn't need to execute
the
lead request a second time if the user clicks on the link preview (--
either
through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when
users
update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
[2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
[3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On 18 August 2015 at 21:54, Gabriel Wicke gwicke@wikimedia.org wrote:
RESTBase is behind regular text Varnishes, so I suspect that the logs might already end up in HDFS. All entry points start with /api/rest_v1/, which shouldn't overlap with potential page views counted at /w/api.php or /wiki/.
Awesome; with that structure it sounds substantially easier than I feared.
On Tue, Aug 18, 2015 at 6:32 PM, Kevin Leduc kevin@wikimedia.org wrote:
We briefly considered counting views of Hover Cards as Pageviews, but it was quickly dismissed. First, the feature is not widely used enough to justify Changing the pageview definition.
I'm still open to counting previews as pageviews, but I think the Readership team and their product managers need to weigh in heavily as Pageviews is a key metric for them.
Finally, counting Pageviews served through RESTBase sounds like a new project and I'd like to hear more about the effort needed from the analytics engineers.
This reminds me.. right now we don't allow Varnishes to cache any content, but we plan to start allowing this soon. At that point, internal RESTBase metrics like http://grafana.wikimedia.org/#/dashboard/db/restbase?panelId=8&fullscree... will only show the cache misses. For our purposes it would be super useful to keep track of total requests matching /api/rest_v1/. This will let us track overall API usage, which is going to be our primary KPI for now. I have created a ticket for this at https://phabricator.wikimedia.org/T109547.
Thanks!
Gabriel
On Tue, Aug 18, 2015 at 4:58 PM, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... [3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Aug 18, 2015, at 6:32 PM, Kevin Leduc kevin@wikimedia.org wrote:
We briefly considered counting views of Hover Cards as Pageviews, but it was quickly dismissed. First, the feature is not widely used enough to justify Changing the pageview definition.
I second that, these are “impressions” and they should be measured separately. I would be very worried about inflating our baseline PV numbers with all sort of features revealing snippets of content.
I’m still open to counting previews as pageviews, but I think the Readership team and their product managers need to weigh in heavily as Pageviews is a key metric for them.
Finally, counting Pageviews served through RESTBase sounds like a new project and I'd like to hear more about the effort needed from the analytics engineers.
On Tue, Aug 18, 2015 at 4:58 PM, Oliver Keyes <okeyes@wikimedia.org mailto:okeyes@wikimedia.org> wrote: On 18 August 2015 at 19:11, Bernd Sitzmann <bernd@wikimedia.org mailto:bernd@wikimedia.org> wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert [3] https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app... https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps
[4] https://phabricator.wikimedia.org/T109383 https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
+1 on baseline inflation
We will have a hard time to connect historic and future pageview counts anyway, now that we migrate to new infrastructure (mostly because historic counts didn't exluded crawlers).
But at least the concepts of 'page' and 'view' haven't changed much in all these years.
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Dario Taraborelli Sent: Wednesday, August 19, 2015 21:23 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] Pageviews definition + measurement for apps adding link previews + using RESTBase
On Aug 18, 2015, at 6:32 PM, Kevin Leduc kevin@wikimedia.org wrote:
We briefly considered counting views of Hover Cards as Pageviews, but it was quickly dismissed. First, the feature is not widely used enough to justify Changing the pageview definition.
I second that, these are “impressions” and they should be measured separately. I would be very worried about inflating our baseline PV numbers with all sort of features revealing snippets of content.
I’m still open to counting previews as pageviews, but I think the Readership team and their product managers need to weigh in heavily as Pageviews is a key metric for them.
Finally, counting Pageviews served through RESTBase sounds like a new project and I'd like to hear more about the effort needed from the analytics engineers.
On Tue, Aug 18, 2015 at 4:58 PM, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... [3] https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Dario / Erik, reading your last few thoughts, maybe I mis-phrased the task that's tracking this project: https://phabricator.wikimedia.org/T109745. Edits are welcome, and I'm sure this conversation will evolve.
So this is how the analytics team wants to support this. We've prioritized and will task and execute the immediate problem: https://phabricator.wikimedia.org/T109383 And we'll track the migration to a more proactive pageview definition in this project: https://phabricator.wikimedia.org/T109745. Codename pika [1] because it's adorable [2]
[1] https://en.wikipedia.org/wiki/Pika [2] for those not familiar, the analytics team is into tagging so we tag all our projects and tasks with four letter animal names (see first column of our main execution board: https://phabricator.wikimedia.org/tag/analytics-kanban/)
On Wed, Aug 19, 2015 at 3:40 PM, Erik Zachte ezachte@wikimedia.org wrote:
+1 on baseline inflation
We will have a hard time to connect historic and future pageview counts anyway, now that we migrate to new infrastructure (mostly because historic counts didn't exluded crawlers).
But at least the concepts of 'page' and 'view' haven't changed much in all these years.
Erik
*From:* analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] *On Behalf Of *Dario Taraborelli *Sent:* Wednesday, August 19, 2015 21:23 *To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] Pageviews definition + measurement for apps adding link previews + using RESTBase
On Aug 18, 2015, at 6:32 PM, Kevin Leduc kevin@wikimedia.org wrote:
We briefly considered counting views of Hover Cards as Pageviews, but it was quickly dismissed. First, the feature is not widely used enough to justify Changing the pageview definition.
I second that, these are “impressions” and they should be measured separately. I would be very worried about inflating our baseline PV numbers with all sort of features revealing snippets of content.
I’m still open to counting previews as pageviews, but I think the Readership team and their product managers need to weigh in heavily as Pageviews is a key metric for them.
Finally, counting Pageviews served through RESTBase sounds like a new project and I'd like to hear more about the effort needed from the analytics engineers.
On Tue, Aug 18, 2015 at 4:58 PM, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two
different
ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the
lead
request[1], too, since there is a lot of overlap. At least that our
current
plan. The advantage of that is that the client doesn't need to execute
the
lead request a second time if the user clicks on the link preview (--
either
through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when
users
update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
[2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
[3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... [3] https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... [3] https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag
or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the
X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for
the
remainder.
The mobile apps are going to change the way they load pages in two
different
ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages
and
also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful
service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we
count
user engagement.
The big question is
Should we count link previews as a page view since it's an indication
of
user engagement? Or should there be a separate metric for link
previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters
of
api.php as a page view. When we publish link previews for all Android
app
users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another
metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call
the
RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the
lead
request[1], too, since there is a lot of overlap. At least that our
current
plan. The advantage of that is that the client doesn't need to execute
the
lead request a second time if the user clicks on the link preview (--
either
through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the
mobile-html-sections-remaining
requests depending on what our definition for page views actually is.
We
could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to
be
compatible and be collected in parallel since we cannot control when
users
update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
[2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
[3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective I prefer the page title since clients don't always know the page id ahead of time. (We could put that header into the second request of loading the page but I cannot guarantee that we we will always have a second request in the future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag
or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the
X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org
wrote:
This discussion is about needed updates of the definition and
Analytics
implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project
there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for
the
remainder.
The mobile apps are going to change the way they load pages in two
different
ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages
and
also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful
service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we
count
user engagement.
The big question is
Should we count link previews as a page view since it's an indication
of
user engagement? Or should there be a separate metric for link
previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters
of
api.php as a page view. When we publish link previews for all Android
app
users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another
metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call
the
RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the
lead
request[1], too, since there is a lot of overlap. At least that our
current
plan. The advantage of that is that the client doesn't need to
execute the
lead request a second time if the user clicks on the link preview (--
either
through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the
mobile-html-sections-remaining
requests depending on what our definition for page views actually is.
We
could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to
be
compatible and be collected in parallel since we cannot control when
users
update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
[2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
[3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be good to have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann bernd@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective I prefer the page title since clients don't always know the page id ahead of time. (We could put that header into the second request of loading the page but I cannot guarantee that we we will always have a second request in the future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu <dandreescu@wikimedia.org mailto:dandreescu@wikimedia.org> wrote: This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <okeyes@wikimedia.org mailto:okeyes@wikimedia.org> wrote: On 19 August 2015 at 10:19, Andrew Otto <aotto@wikimedia.org mailto:aotto@wikimedia.org> wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes <okeyes@wikimedia.org mailto:okeyes@wikimedia.org> wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann <bernd@wikimedia.org mailto:bernd@wikimedia.org> wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2] https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert [3] https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app... https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps
[4] https://phabricator.wikimedia.org/T109383 https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
It'll need to be, some requests don't know pageID in advance, which I think was the reason Apps initially didn't implement this.
On 19 August 2015 at 12:19, Andrew Otto aotto@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be good to have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann bernd@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective I prefer the page title since clients don't always know the page id ahead of time. (We could put that header into the second request of loading the page but I cannot guarantee that we we will always have a second request in the future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... [3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Ya, we can probably tweak pageview definition to use page_id / page_title if they exist, and only use the rest of the logic if they don’t.
On Aug 19, 2015, at 12:24, Oliver Keyes okeyes@wikimedia.org wrote:
It'll need to be, some requests don't know pageID in advance, which I think was the reason Apps initially didn't implement this.
On 19 August 2015 at 12:19, Andrew Otto aotto@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be good to have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann bernd@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective I prefer the page title since clients don't always know the page id ahead of time. (We could put that header into the second request of loading the page but I cannot guarantee that we we will always have a second request in the future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote: > This discussion is about needed updates of the definition and > Analytics > implementation for mobile apps page view metrics. There is also an > associated Phab task[4]. Please add the proper Analytics project > there. > > Background / Changes > > As you probably remember, the Android app splits a page view into two > requests: one for the lead section and metadata, plus another one for > the > remainder. > > The mobile apps are going to change the way they load pages in two > different > ways: > > We'll add a link preview when someone clicks on a link from a page. > We're planning on switching over the using RESTBase for loading pages > and > also the link preview (initially just the Android beta, ater more) >
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
> This will have implications for the pageviews definition and how we > count > user engagement. > > The big question is > > Should we count link previews as a page view since it's an indication > of > user engagement? Or should there be a separate metric for link > previews? > > Counting page views > > IIRC we currently count action=mobileview§ions=0 query parameters > of > api.php as a page view. When we publish link previews for all Android > app > users then we would either want to count also the calls to > action=query&prop=extracts as a page view or add them to another > metric. > > Once the apps use RESTBase the HTTPS requests will be very different: > > Page view: Instead of action=mobileview§ions=0 the app would call > the > RESTBase endpoint for lead request[1] instead of the PHP API > mentioned > above. Then it would call [2]. > Link preview: Instead of action=query&prop=extracts it would call the > lead > request[1], too, since there is a lot of overlap. At least that our > current > plan. The advantage of that is that the client doesn't need to > execute the > lead request a second time if the user clicks on the link preview (-- > either > through caching or app logic.) > > So, in the RESTBase case we either want to count the > mobile-html-sections-lead requests or the > mobile-html-sections-remaining > requests depending on what our definition for page views actually is. > We > could also add a query parameter or extra HTTP header to one of the > mobile-html-sections-lead requests if we need to distinguish between > previews and page views. > > Both the current PHP API and the RESTBase based metrics would need to > be > compatible and be collected in parallel since we cannot control when > users > update their apps. > > [1] > > https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert > [2] > > https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... > [3] > > https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app... > > [4] https://phabricator.wikimedia.org/T109383 > > > Cheers, > > Bernd > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Sounds sensible enough as long as we're sure they're only being passed through for "real" pageviews right now. (I can't imagine a situation in which they wouldn't be but someone should run that query)
On 19 August 2015 at 12:27, Andrew Otto aotto@wikimedia.org wrote:
Ya, we can probably tweak pageview definition to use page_id / page_title if they exist, and only use the rest of the logic if they don’t.
On Aug 19, 2015, at 12:24, Oliver Keyes okeyes@wikimedia.org wrote:
It'll need to be, some requests don't know pageID in advance, which I think was the reason Apps initially didn't implement this.
On 19 August 2015 at 12:19, Andrew Otto aotto@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be good to have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann bernd@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective I prefer the page title since clients don't always know the page id ahead of time. (We could put that header into the second request of loading the page but I cannot guarantee that we we will always have a second request in the future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
> If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new URL > scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
> On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote: > > On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org > wrote: >> This discussion is about needed updates of the definition and >> Analytics >> implementation for mobile apps page view metrics. There is also an >> associated Phab task[4]. Please add the proper Analytics project >> there. >> >> Background / Changes >> >> As you probably remember, the Android app splits a page view into two >> requests: one for the lead section and metadata, plus another one for >> the >> remainder. >> >> The mobile apps are going to change the way they load pages in two >> different >> ways: >> >> We'll add a link preview when someone clicks on a link from a page. >> We're planning on switching over the using RESTBase for loading pages >> and >> also the link preview (initially just the Android beta, ater more) >> > > Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful > service API? > > Last time I checked that wasn't even consumed by HDFS. Is it now being > consumed by HDFS? > > More importantly the actual URLs are going to look /totally/ > different. If we do not include RESTBase requests, we will miss the > apps. If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new URL > scheme, we will also potentially have to rewrite every /other/ bit of > the definition to /not/ incorporate those requests. > > (I use "we" in a collective sense. This isn't my baby any more, > although if Joseph et al want help with the refactor here I'm happy to > spend my volunteer time on it). > > But basically every other bit of your email is important but now > secondary: this is a potentially massive change, all on its own, even > without the link preview, even if the substance of the requests going > to RESTBase were identical. > >> This will have implications for the pageviews definition and how we >> count >> user engagement. >> >> The big question is >> >> Should we count link previews as a page view since it's an indication >> of >> user engagement? Or should there be a separate metric for link >> previews? >> >> Counting page views >> >> IIRC we currently count action=mobileview§ions=0 query parameters >> of >> api.php as a page view. When we publish link previews for all Android >> app >> users then we would either want to count also the calls to >> action=query&prop=extracts as a page view or add them to another >> metric. >> >> Once the apps use RESTBase the HTTPS requests will be very different: >> >> Page view: Instead of action=mobileview§ions=0 the app would call >> the >> RESTBase endpoint for lead request[1] instead of the PHP API >> mentioned >> above. Then it would call [2]. >> Link preview: Instead of action=query&prop=extracts it would call the >> lead >> request[1], too, since there is a lot of overlap. At least that our >> current >> plan. The advantage of that is that the client doesn't need to >> execute the >> lead request a second time if the user clicks on the link preview (-- >> either >> through caching or app logic.) >> >> So, in the RESTBase case we either want to count the >> mobile-html-sections-lead requests or the >> mobile-html-sections-remaining >> requests depending on what our definition for page views actually is. >> We >> could also add a query parameter or extra HTTP header to one of the >> mobile-html-sections-lead requests if we need to distinguish between >> previews and page views. >> >> Both the current PHP API and the RESTBase based metrics would need to >> be >> compatible and be collected in parallel since we cannot control when >> users >> update their apps. >> >> [1] >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert >> [2] >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... >> [3] >> >> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app... >> >> [4] https://phabricator.wikimedia.org/T109383 >> >> >> Cheers, >> >> Bernd >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Here's a task that captures some of the things to consider for server side enrichment of X-Analytics (in this case it would be the Mobile Content Service doing the work, I think).
https://phabricator.wikimedia.org/T92875
Here are the quarterly goals. The thought to reflect counting in a more efficient way kind of entered a little later in the quarter, sorry about that (and thanks for helping us figure out short and mid-term approach).
https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q1_Goals#Readin...
-Adam
On Wed, Aug 19, 2015 at 9:27 AM, Andrew Otto aotto@wikimedia.org wrote:
Ya, we can probably tweak pageview definition to use page_id / page_title if they exist, and only use the rest of the logic if they don’t.
On Aug 19, 2015, at 12:24, Oliver Keyes okeyes@wikimedia.org wrote:
It'll need to be, some requests don't know pageID in advance, which I think was the reason Apps initially didn't implement this.
On 19 August 2015 at 12:19, Andrew Otto aotto@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be
good to
have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann bernd@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective
I
prefer the page title since clients don't always know the page id ahead
of
time. (We could put that header into the second request of loading the
page
but I cannot guarantee that we we will always have a second request in
the
future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu <
dandreescu@wikimedia.org>
wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
> If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new
URL
> scheme
I really think that apps and APIs should do something proactive to
tag
or log a pageview. With more ways of viewing content, it is going
to get
harder and harder to maintain a pattern based definition. A
pageview should
be an event that is logged, not something that is pattern matched
out of a
very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because
APIs
are more complicated (e.g. more than one page can be served in a
single
request, etc.). In the longterm, there should be a pageview event
stream
just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
> On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org
wrote:
> > On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org > wrote: >> This discussion is about needed updates of the definition and >> Analytics >> implementation for mobile apps page view metrics. There is also an >> associated Phab task[4]. Please add the proper Analytics project >> there. >> >> Background / Changes >> >> As you probably remember, the Android app splits a page view into
two
>> requests: one for the lead section and metadata, plus another one
for
>> the >> remainder. >> >> The mobile apps are going to change the way they load pages in two >> different >> ways: >> >> We'll add a link preview when someone clicks on a link from a page. >> We're planning on switching over the using RESTBase for loading
pages
>> and >> also the link preview (initially just the Android beta, ater more) >> > > Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful > service API? > > Last time I checked that wasn't even consumed by HDFS. Is it now
being
> consumed by HDFS? > > More importantly the actual URLs are going to look /totally/ > different. If we do not include RESTBase requests, we will miss the > apps. If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new
URL
> scheme, we will also potentially have to rewrite every /other/ bit
of
> the definition to /not/ incorporate those requests. > > (I use "we" in a collective sense. This isn't my baby any more, > although if Joseph et al want help with the refactor here I'm happy
to
> spend my volunteer time on it). > > But basically every other bit of your email is important but now > secondary: this is a potentially massive change, all on its own,
even
> without the link preview, even if the substance of the requests
going
> to RESTBase were identical. > >> This will have implications for the pageviews definition and how we >> count >> user engagement. >> >> The big question is >> >> Should we count link previews as a page view since it's an
indication
>> of >> user engagement? Or should there be a separate metric for link >> previews? >> >> Counting page views >> >> IIRC we currently count action=mobileview§ions=0 query
parameters
>> of >> api.php as a page view. When we publish link previews for all
Android
>> app >> users then we would either want to count also the calls to >> action=query&prop=extracts as a page view or add them to another >> metric. >> >> Once the apps use RESTBase the HTTPS requests will be very
different:
>> >> Page view: Instead of action=mobileview§ions=0 the app would
call
>> the >> RESTBase endpoint for lead request[1] instead of the PHP API >> mentioned >> above. Then it would call [2]. >> Link preview: Instead of action=query&prop=extracts it would call
the
>> lead >> request[1], too, since there is a lot of overlap. At least that our >> current >> plan. The advantage of that is that the client doesn't need to >> execute the >> lead request a second time if the user clicks on the link preview
(--
>> either >> through caching or app logic.) >> >> So, in the RESTBase case we either want to count the >> mobile-html-sections-lead requests or the >> mobile-html-sections-remaining >> requests depending on what our definition for page views actually
is.
>> We >> could also add a query parameter or extra HTTP header to one of the >> mobile-html-sections-lead requests if we need to distinguish
between
>> previews and page views. >> >> Both the current PHP API and the RESTBase based metrics would need
to
>> be >> compatible and be collected in parallel since we cannot control
when
>> users >> update their apps. >> >> [1] >> >>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
>> [2] >> >>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
>> [3] >> >>
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
>> >> [4] https://phabricator.wikimedia.org/T109383 >> >> >> Cheers, >> >> Bernd >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Andrew,
Are you saying the apps have the option to skip providing one of page_title or page_id? I hope this is the case since I just came up with a scheme where we could avoid the second request when a page has only a single section, which we already get through the first (lead) request.
Yes to what Oliver said: The apps don't always know the page_id ahead of time (only sometimes). The best example where we don't know the page_id ahead of time is when someone searches for a term on Google search on an Android device, and gets directed to our Android app. The app only gets the URL of the page, which we then take to derive the wiki and page_title from.
Bernd
On Wed, Aug 19, 2015 at 10:24 AM, Oliver Keyes okeyes@wikimedia.org wrote:
It'll need to be, some requests don't know pageID in advance, which I think was the reason Apps initially didn't implement this.
On 19 August 2015 at 12:19, Andrew Otto aotto@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be
good to
have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann bernd@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective I prefer the page title since clients don't always know the page id ahead
of
time. (We could put that header into the second request of loading the
page
but I cannot guarantee that we we will always have a second request in
the
future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu <dandreescu@wikimedia.org
wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new
URL
scheme
I really think that apps and APIs should do something proactive to
tag
or log a pageview. With more ways of viewing content, it is going
to get
harder and harder to maintain a pattern based definition. A
pageview should
be an event that is logged, not something that is pattern matched
out of a
very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because
APIs
are more complicated (e.g. more than one page can be served in a
single
request, etc.). In the longterm, there should be a pageview event
stream
just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org
wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote: > This discussion is about needed updates of the definition and > Analytics > implementation for mobile apps page view metrics. There is also an > associated Phab task[4]. Please add the proper Analytics project > there. > > Background / Changes > > As you probably remember, the Android app splits a page view into
two
> requests: one for the lead section and metadata, plus another one
for
> the > remainder. > > The mobile apps are going to change the way they load pages in two > different > ways: > > We'll add a link preview when someone clicks on a link from a page. > We're planning on switching over the using RESTBase for loading
pages
> and > also the link preview (initially just the Android beta, ater more) >
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now
being
consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new
URL
scheme, we will also potentially have to rewrite every /other/ bit
of
the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy
to
spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own,
even
without the link preview, even if the substance of the requests
going
to RESTBase were identical.
> This will have implications for the pageviews definition and how we > count > user engagement. > > The big question is > > Should we count link previews as a page view since it's an
indication
> of > user engagement? Or should there be a separate metric for link > previews? > > Counting page views > > IIRC we currently count action=mobileview§ions=0 query
parameters
> of > api.php as a page view. When we publish link previews for all
Android
> app > users then we would either want to count also the calls to > action=query&prop=extracts as a page view or add them to another > metric. > > Once the apps use RESTBase the HTTPS requests will be very
different:
> > Page view: Instead of action=mobileview§ions=0 the app would
call
> the > RESTBase endpoint for lead request[1] instead of the PHP API > mentioned > above. Then it would call [2]. > Link preview: Instead of action=query&prop=extracts it would call
the
> lead > request[1], too, since there is a lot of overlap. At least that our > current > plan. The advantage of that is that the client doesn't need to > execute the > lead request a second time if the user clicks on the link preview
(--
> either > through caching or app logic.) > > So, in the RESTBase case we either want to count the > mobile-html-sections-lead requests or the > mobile-html-sections-remaining > requests depending on what our definition for page views actually
is.
> We > could also add a query parameter or extra HTTP header to one of the > mobile-html-sections-lead requests if we need to distinguish
between
> previews and page views. > > Both the current PHP API and the RESTBase based metrics would need
to
> be > compatible and be collected in parallel since we cannot control
when
> users > update their apps. > > [1] > >
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
> [2] > >
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
> [3] > >
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
> > [4] https://phabricator.wikimedia.org/T109383 > > > Cheers, > > Bernd > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On 19 August 2015 at 12:29, Bernd Sitzmann bernd@wikimedia.org wrote:
Andrew,
Are you saying the apps have the option to skip providing one of page_title or page_id? I hope this is the case since I just came up with a scheme where we could avoid the second request when a page has only a single section, which we already get through the first (lead) request.
Yep; you'd need to provide one or the other, not both. We're actually already looking for sections=0 due precisely to this (that there are two requests for one page) so only including the page_title there should not mess with the continuity of data.
Yes to what Oliver said: The apps don't always know the page_id ahead of time (only sometimes). The best example where we don't know the page_id ahead of time is when someone searches for a term on Google search on an Android device, and gets directed to our Android app. The app only gets the URL of the page, which we then take to derive the wiki and page_title from.
Bernd
On Wed, Aug 19, 2015 at 10:24 AM, Oliver Keyes okeyes@wikimedia.org wrote:
It'll need to be, some requests don't know pageID in advance, which I think was the reason Apps initially didn't implement this.
On 19 August 2015 at 12:19, Andrew Otto aotto@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be good to have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann bernd@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective I prefer the page title since clients don't always know the page id ahead of time. (We could put that header into the second request of loading the page but I cannot guarantee that we we will always have a second request in the future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
> If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new > URL > scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
> On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org > wrote: > > On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org > wrote: >> This discussion is about needed updates of the definition and >> Analytics >> implementation for mobile apps page view metrics. There is also an >> associated Phab task[4]. Please add the proper Analytics project >> there. >> >> Background / Changes >> >> As you probably remember, the Android app splits a page view into >> two >> requests: one for the lead section and metadata, plus another one >> for >> the >> remainder. >> >> The mobile apps are going to change the way they load pages in two >> different >> ways: >> >> We'll add a link preview when someone clicks on a link from a >> page. >> We're planning on switching over the using RESTBase for loading >> pages >> and >> also the link preview (initially just the Android beta, ater more) >> > > Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful > service API? > > Last time I checked that wasn't even consumed by HDFS. Is it now > being > consumed by HDFS? > > More importantly the actual URLs are going to look /totally/ > different. If we do not include RESTBase requests, we will miss the > apps. If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new > URL > scheme, we will also potentially have to rewrite every /other/ bit > of > the definition to /not/ incorporate those requests. > > (I use "we" in a collective sense. This isn't my baby any more, > although if Joseph et al want help with the refactor here I'm happy > to > spend my volunteer time on it). > > But basically every other bit of your email is important but now > secondary: this is a potentially massive change, all on its own, > even > without the link preview, even if the substance of the requests > going > to RESTBase were identical. > >> This will have implications for the pageviews definition and how >> we >> count >> user engagement. >> >> The big question is >> >> Should we count link previews as a page view since it's an >> indication >> of >> user engagement? Or should there be a separate metric for link >> previews? >> >> Counting page views >> >> IIRC we currently count action=mobileview§ions=0 query >> parameters >> of >> api.php as a page view. When we publish link previews for all >> Android >> app >> users then we would either want to count also the calls to >> action=query&prop=extracts as a page view or add them to another >> metric. >> >> Once the apps use RESTBase the HTTPS requests will be very >> different: >> >> Page view: Instead of action=mobileview§ions=0 the app would >> call >> the >> RESTBase endpoint for lead request[1] instead of the PHP API >> mentioned >> above. Then it would call [2]. >> Link preview: Instead of action=query&prop=extracts it would call >> the >> lead >> request[1], too, since there is a lot of overlap. At least that >> our >> current >> plan. The advantage of that is that the client doesn't need to >> execute the >> lead request a second time if the user clicks on the link preview >> (-- >> either >> through caching or app logic.) >> >> So, in the RESTBase case we either want to count the >> mobile-html-sections-lead requests or the >> mobile-html-sections-remaining >> requests depending on what our definition for page views actually >> is. >> We >> could also add a query parameter or extra HTTP header to one of >> the >> mobile-html-sections-lead requests if we need to distinguish >> between >> previews and page views. >> >> Both the current PHP API and the RESTBase based metrics would need >> to >> be >> compatible and be collected in parallel since we cannot control >> when >> users >> update their apps. >> >> [1] >> >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert >> [2] >> >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... >> [3] >> >> >> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app... >> >> [4] https://phabricator.wikimedia.org/T109383 >> >> >> Cheers, >> >> Bernd >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Good, so page_id if you have it, page_title if not. We'll work around the foundation (I'll talk about this at Scrum of Scrums in an hour) until everyone respects this convention, and then we'll change the pageview definition to use it. Sounds like the beginning of a beautiful thing :)
On Wed, Aug 19, 2015 at 12:33 PM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 12:29, Bernd Sitzmann bernd@wikimedia.org wrote:
Andrew,
Are you saying the apps have the option to skip providing one of
page_title
or page_id? I hope this is the case since I just came up with a scheme where we could avoid the second request when a page has only a single section, which we already get through the first (lead) request.
Yep; you'd need to provide one or the other, not both. We're actually already looking for sections=0 due precisely to this (that there are two requests for one page) so only including the page_title there should not mess with the continuity of data.
Yes to what Oliver said: The apps don't always know the page_id ahead of time (only sometimes). The best example where we don't know the page_id ahead of time is when someone searches for a term on Google search on an Android device, and gets directed to our Android app. The app only gets
the
URL of the page, which we then take to derive the wiki and page_title
from.
Bernd
On Wed, Aug 19, 2015 at 10:24 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
It'll need to be, some requests don't know pageID in advance, which I think was the reason Apps initially didn't implement this.
On 19 August 2015 at 12:19, Andrew Otto aotto@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as
a
pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be good to have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann bernd@wikimedia.org
wrote:
If your app/site/etc. is creating a request that it wants to count
as a
pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's
perspective
I prefer the page title since clients don't always know the page id
ahead
of time. (We could put that header into the second request of loading the page but I cannot guarantee that we we will always have a second request in the future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should
follow
through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count
as a
pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <okeyes@wikimedia.org
wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org
wrote:
>> If we /do/ include RESTBase requests we will not only have to >> rewrite the pageview definition for the apps to recognise the new >> URL >> scheme > > I really think that apps and APIs should do something proactive to > tag > or log a pageview. With more ways of viewing content, it is going > to get > harder and harder to maintain a pattern based definition. A > pageview should > be an event that is logged, not something that is pattern matched > out of a > very noisy stream of data. > > Most mediawiki requests do this now, via the page_id field in the > X-Analytlics header, but we can’t use this for all pageviews
because
> APIs > are more complicated (e.g. more than one page can be served in a > single > request, etc.). In the longterm, there should be a pageview event > stream > just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this
for
kind of a while, so...
> > -Ao > > > >> On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org >> wrote: >> >> On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org >> wrote: >>> This discussion is about needed updates of the definition and >>> Analytics >>> implementation for mobile apps page view metrics. There is also
an
>>> associated Phab task[4]. Please add the proper Analytics project >>> there. >>> >>> Background / Changes >>> >>> As you probably remember, the Android app splits a page view
into
>>> two >>> requests: one for the lead section and metadata, plus another
one
>>> for >>> the >>> remainder. >>> >>> The mobile apps are going to change the way they load pages in
two
>>> different >>> ways: >>> >>> We'll add a link preview when someone clicks on a link from a >>> page. >>> We're planning on switching over the using RESTBase for loading >>> pages >>> and >>> also the link preview (initially just the Android beta, ater
more)
>>> >> >> Woah woah woah woah woah. By RESTBase do you mean Gabriel's
RESTful
>> service API? >> >> Last time I checked that wasn't even consumed by HDFS. Is it now >> being >> consumed by HDFS? >> >> More importantly the actual URLs are going to look /totally/ >> different. If we do not include RESTBase requests, we will miss
the
>> apps. If we /do/ include RESTBase requests we will not only have
to
>> rewrite the pageview definition for the apps to recognise the new >> URL >> scheme, we will also potentially have to rewrite every /other/
bit
>> of >> the definition to /not/ incorporate those requests. >> >> (I use "we" in a collective sense. This isn't my baby any more, >> although if Joseph et al want help with the refactor here I'm
happy
>> to >> spend my volunteer time on it). >> >> But basically every other bit of your email is important but now >> secondary: this is a potentially massive change, all on its own, >> even >> without the link preview, even if the substance of the requests >> going >> to RESTBase were identical. >> >>> This will have implications for the pageviews definition and how >>> we >>> count >>> user engagement. >>> >>> The big question is >>> >>> Should we count link previews as a page view since it's an >>> indication >>> of >>> user engagement? Or should there be a separate metric for link >>> previews? >>> >>> Counting page views >>> >>> IIRC we currently count action=mobileview§ions=0 query >>> parameters >>> of >>> api.php as a page view. When we publish link previews for all >>> Android >>> app >>> users then we would either want to count also the calls to >>> action=query&prop=extracts as a page view or add them to another >>> metric. >>> >>> Once the apps use RESTBase the HTTPS requests will be very >>> different: >>> >>> Page view: Instead of action=mobileview§ions=0 the app would >>> call >>> the >>> RESTBase endpoint for lead request[1] instead of the PHP API >>> mentioned >>> above. Then it would call [2]. >>> Link preview: Instead of action=query&prop=extracts it would
call
>>> the >>> lead >>> request[1], too, since there is a lot of overlap. At least that >>> our >>> current >>> plan. The advantage of that is that the client doesn't need to >>> execute the >>> lead request a second time if the user clicks on the link
preview
>>> (-- >>> either >>> through caching or app logic.) >>> >>> So, in the RESTBase case we either want to count the >>> mobile-html-sections-lead requests or the >>> mobile-html-sections-remaining >>> requests depending on what our definition for page views
actually
>>> is. >>> We >>> could also add a query parameter or extra HTTP header to one of >>> the >>> mobile-html-sections-lead requests if we need to distinguish >>> between >>> previews and page views. >>> >>> Both the current PHP API and the RESTBase based metrics would
need
>>> to >>> be >>> compatible and be collected in parallel since we cannot control >>> when >>> users >>> update their apps. >>> >>> [1] >>> >>> >>>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
>>> [2] >>> >>> >>>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
>>> [3] >>> >>> >>>
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
>>> >>> [4] https://phabricator.wikimedia.org/T109383 >>> >>> >>> Cheers, >>> >>> Bernd >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I think if we do this right, we should prefer page_id, but use page_title if it is provided.
However, at the moment we don’t have a good way of actually getting page_title in Hadoop from the MW DBs even if given a page_id. We’d still have to infer the title from the URI. I’d prefer if page_id was the canonical way of identifying a page view, but currently page_title is used in all pageview statistics. Using the page_title as the generator of the request sees it might be even more correct than inferring it from the URI. Or, maybe it would be better (for the moment) to use use the existence of page_id or page_title to indicate to the pageview definition logic that this request is definitely already a pageview, and then use the same page title from URI logic on all requests no matter what.
page_id or page_title would just allow the pageview definition pattern matching logic to be skipped, as we would know right up front that a request is a pageview.
Are you saying the apps have the option to skip providing one of page_title or page_id?
So uhhh, yes! I think, although I am not the authority on this. I defer to other analytics engineers who will actually have to implement and maintain this change :)
On Aug 19, 2015, at 12:29, Bernd Sitzmann bernd@wikimedia.org wrote:
Andrew,
Are you saying the apps have the option to skip providing one of page_title or page_id? I hope this is the case since I just came up with a scheme where we could avoid the second request when a page has only a single section, which we already get through the first (lead) request.
Yes to what Oliver said: The apps don't always know the page_id ahead of time (only sometimes). The best example where we don't know the page_id ahead of time is when someone searches for a term on Google search on an Android device, and gets directed to our Android app. The app only gets the URL of the page, which we then take to derive the wiki and page_title from.
Bernd
On Wed, Aug 19, 2015 at 10:24 AM, Oliver Keyes <okeyes@wikimedia.org mailto:okeyes@wikimedia.org> wrote: It'll need to be, some requests don't know pageID in advance, which I think was the reason Apps initially didn't implement this.
On 19 August 2015 at 12:19, Andrew Otto <aotto@wikimedia.org mailto:aotto@wikimedia.org> wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be good to have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann <bernd@wikimedia.org mailto:bernd@wikimedia.org> wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective I prefer the page title since clients don't always know the page id ahead of time. (We could put that header into the second request of loading the page but I cannot guarantee that we we will always have a second request in the future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu <dandreescu@wikimedia.org mailto:dandreescu@wikimedia.org> wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <okeyes@wikimedia.org mailto:okeyes@wikimedia.org> wrote:
On 19 August 2015 at 10:19, Andrew Otto <aotto@wikimedia.org mailto:aotto@wikimedia.org> wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes <okeyes@wikimedia.org mailto:okeyes@wikimedia.org> wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann <bernd@wikimedia.org mailto:bernd@wikimedia.org> wrote: > This discussion is about needed updates of the definition and > Analytics > implementation for mobile apps page view metrics. There is also an > associated Phab task[4]. Please add the proper Analytics project > there. > > Background / Changes > > As you probably remember, the Android app splits a page view into two > requests: one for the lead section and metadata, plus another one for > the > remainder. > > The mobile apps are going to change the way they load pages in two > different > ways: > > We'll add a link preview when someone clicks on a link from a page. > We're planning on switching over the using RESTBase for loading pages > and > also the link preview (initially just the Android beta, ater more) >
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
> This will have implications for the pageviews definition and how we > count > user engagement. > > The big question is > > Should we count link previews as a page view since it's an indication > of > user engagement? Or should there be a separate metric for link > previews? > > Counting page views > > IIRC we currently count action=mobileview§ions=0 query parameters > of > api.php as a page view. When we publish link previews for all Android > app > users then we would either want to count also the calls to > action=query&prop=extracts as a page view or add them to another > metric. > > Once the apps use RESTBase the HTTPS requests will be very different: > > Page view: Instead of action=mobileview§ions=0 the app would call > the > RESTBase endpoint for lead request[1] instead of the PHP API > mentioned > above. Then it would call [2]. > Link preview: Instead of action=query&prop=extracts it would call the > lead > request[1], too, since there is a lot of overlap. At least that our > current > plan. The advantage of that is that the client doesn't need to > execute the > lead request a second time if the user clicks on the link preview (-- > either > through caching or app logic.) > > So, in the RESTBase case we either want to count the > mobile-html-sections-lead requests or the > mobile-html-sections-remaining > requests depending on what our definition for page views actually is. > We > could also add a query parameter or extra HTTP header to one of the > mobile-html-sections-lead requests if we need to distinguish between > previews and page views. > > Both the current PHP API and the RESTBase based metrics would need to > be > compatible and be collected in parallel since we cannot control when > users > update their apps. > > [1] > > https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert > [2] > > https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dilbert > [3] > > https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app... https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_apps > > [4] https://phabricator.wikimedia.org/T109383 https://phabricator.wikimedia.org/T109383 > > > Cheers, > > Bernd > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Oh that makes sense. page_title always, and page_id if you have it. I wonder if there's a way to get the canonical post-redirect page_title in all cases... hm...
On Wed, Aug 19, 2015 at 12:35 PM, Andrew Otto aotto@wikimedia.org wrote:
I think if we do this right, we should prefer page_id, but use page_title if it is provided.
However, at the moment we don’t have a good way of actually getting page_title in Hadoop from the MW DBs even if given a page_id. We’d still have to infer the title from the URI. I’d prefer if page_id was the canonical way of identifying a page view, but currently page_title is used in all pageview statistics. Using the page_title as the generator of the request sees it might be even more correct than inferring it from the URI. Or, maybe it would be better (for the moment) to use use the existence of page_id or page_title to indicate to the pageview definition logic that this request is definitely already a pageview, and then use the same page title from URI logic on all requests no matter what.
page_id or page_title would just allow the pageview definition pattern matching logic to be skipped, as we would know right up front that a request is a pageview.
Are you saying the apps have the option to skip providing one of page_title or page_id?
So uhhh, yes! I think, although I am not the authority on this. I defer to other analytics engineers who will actually have to implement and maintain this change :)
On Aug 19, 2015, at 12:29, Bernd Sitzmann bernd@wikimedia.org wrote:
Andrew,
Are you saying the apps have the option to skip providing one of page_title or page_id? I hope this is the case since I just came up with a scheme where we could avoid the second request when a page has only a single section, which we already get through the first (lead) request.
Yes to what Oliver said: The apps don't always know the page_id ahead of time (only sometimes). The best example where we don't know the page_id ahead of time is when someone searches for a term on Google search on an Android device, and gets directed to our Android app. The app only gets the URL of the page, which we then take to derive the wiki and page_title from.
Bernd
On Wed, Aug 19, 2015 at 10:24 AM, Oliver Keyes okeyes@wikimedia.org wrote:
It'll need to be, some requests don't know pageID in advance, which I think was the reason Apps initially didn't implement this.
On 19 August 2015 at 12:19, Andrew Otto aotto@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be
good to
have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann bernd@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective
I
prefer the page title since clients don't always know the page id ahead
of
time. (We could put that header into the second request of loading the
page
but I cannot guarantee that we we will always have a second request in
the
future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu <
dandreescu@wikimedia.org>
wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
> If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new
URL
> scheme
I really think that apps and APIs should do something proactive to
tag
or log a pageview. With more ways of viewing content, it is going
to get
harder and harder to maintain a pattern based definition. A
pageview should
be an event that is logged, not something that is pattern matched
out of a
very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews
because APIs
are more complicated (e.g. more than one page can be served in a
single
request, etc.). In the longterm, there should be a pageview event
stream
just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
> On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org
wrote:
> > On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org > wrote: >> This discussion is about needed updates of the definition and >> Analytics >> implementation for mobile apps page view metrics. There is also an >> associated Phab task[4]. Please add the proper Analytics project >> there. >> >> Background / Changes >> >> As you probably remember, the Android app splits a page view into
two
>> requests: one for the lead section and metadata, plus another one
for
>> the >> remainder. >> >> The mobile apps are going to change the way they load pages in two >> different >> ways: >> >> We'll add a link preview when someone clicks on a link from a
page.
>> We're planning on switching over the using RESTBase for loading
pages
>> and >> also the link preview (initially just the Android beta, ater more) >> > > Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful > service API? > > Last time I checked that wasn't even consumed by HDFS. Is it now
being
> consumed by HDFS? > > More importantly the actual URLs are going to look /totally/ > different. If we do not include RESTBase requests, we will miss the > apps. If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new
URL
> scheme, we will also potentially have to rewrite every /other/ bit
of
> the definition to /not/ incorporate those requests. > > (I use "we" in a collective sense. This isn't my baby any more, > although if Joseph et al want help with the refactor here I'm
happy to
> spend my volunteer time on it). > > But basically every other bit of your email is important but now > secondary: this is a potentially massive change, all on its own,
even
> without the link preview, even if the substance of the requests
going
> to RESTBase were identical. > >> This will have implications for the pageviews definition and how
we
>> count >> user engagement. >> >> The big question is >> >> Should we count link previews as a page view since it's an
indication
>> of >> user engagement? Or should there be a separate metric for link >> previews? >> >> Counting page views >> >> IIRC we currently count action=mobileview§ions=0 query
parameters
>> of >> api.php as a page view. When we publish link previews for all
Android
>> app >> users then we would either want to count also the calls to >> action=query&prop=extracts as a page view or add them to another >> metric. >> >> Once the apps use RESTBase the HTTPS requests will be very
different:
>> >> Page view: Instead of action=mobileview§ions=0 the app would
call
>> the >> RESTBase endpoint for lead request[1] instead of the PHP API >> mentioned >> above. Then it would call [2]. >> Link preview: Instead of action=query&prop=extracts it would call
the
>> lead >> request[1], too, since there is a lot of overlap. At least that
our
>> current >> plan. The advantage of that is that the client doesn't need to >> execute the >> lead request a second time if the user clicks on the link preview
(--
>> either >> through caching or app logic.) >> >> So, in the RESTBase case we either want to count the >> mobile-html-sections-lead requests or the >> mobile-html-sections-remaining >> requests depending on what our definition for page views actually
is.
>> We >> could also add a query parameter or extra HTTP header to one of
the
>> mobile-html-sections-lead requests if we need to distinguish
between
>> previews and page views. >> >> Both the current PHP API and the RESTBase based metrics would
need to
>> be >> compatible and be collected in parallel since we cannot control
when
>> users >> update their apps. >> >> [1] >> >>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
>> [2] >> >>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
>> [3] >> >>
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
>> >> [4] https://phabricator.wikimedia.org/T109383 >> >> >> Cheers, >> >> Bernd >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
No, I'd suggest going for OR. It is possible to have data formats that just reflect ID in the same way we have data formats that just reflect title.
On 19 August 2015 at 12:38, Dan Andreescu dandreescu@wikimedia.org wrote:
Oh that makes sense. page_title always, and page_id if you have it. I wonder if there's a way to get the canonical post-redirect page_title in all cases... hm...
On Wed, Aug 19, 2015 at 12:35 PM, Andrew Otto aotto@wikimedia.org wrote:
I think if we do this right, we should prefer page_id, but use page_title if it is provided.
However, at the moment we don’t have a good way of actually getting page_title in Hadoop from the MW DBs even if given a page_id. We’d still have to infer the title from the URI. I’d prefer if page_id was the canonical way of identifying a page view, but currently page_title is used in all pageview statistics. Using the page_title as the generator of the request sees it might be even more correct than inferring it from the URI. Or, maybe it would be better (for the moment) to use use the existence of page_id or page_title to indicate to the pageview definition logic that this request is definitely already a pageview, and then use the same page title from URI logic on all requests no matter what.
page_id or page_title would just allow the pageview definition pattern matching logic to be skipped, as we would know right up front that a request is a pageview.
Are you saying the apps have the option to skip providing one of page_title or page_id?
So uhhh, yes! I think, although I am not the authority on this. I defer to other analytics engineers who will actually have to implement and maintain this change :)
On Aug 19, 2015, at 12:29, Bernd Sitzmann bernd@wikimedia.org wrote:
Andrew,
Are you saying the apps have the option to skip providing one of page_title or page_id? I hope this is the case since I just came up with a scheme where we could avoid the second request when a page has only a single section, which we already get through the first (lead) request.
Yes to what Oliver said: The apps don't always know the page_id ahead of time (only sometimes). The best example where we don't know the page_id ahead of time is when someone searches for a term on Google search on an Android device, and gets directed to our Android app. The app only gets the URL of the page, which we then take to derive the wiki and page_title from.
Bernd
On Wed, Aug 19, 2015 at 10:24 AM, Oliver Keyes okeyes@wikimedia.org wrote:
It'll need to be, some requests don't know pageID in advance, which I think was the reason Apps initially didn't implement this.
On 19 August 2015 at 12:19, Andrew Otto aotto@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
page_id is the current key, so let’s keep that. page_title would be good to have too. Let’s make it an and/or.
On Aug 19, 2015, at 12:17, Bernd Sitzmann bernd@wikimedia.org wrote:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
Ideally the page id would be the way to go. From a client's perspective I prefer the page title since clients don't always know the page id ahead of time. (We could put that header into the second request of loading the page but I cannot guarantee that we we will always have a second request in the future.)
--Cheers, Bernd
On Wed, Aug 19, 2015 at 8:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote: >> If we /do/ include RESTBase requests we will not only have to >> rewrite the pageview definition for the apps to recognise the new >> URL >> scheme > > I really think that apps and APIs should do something proactive to > tag > or log a pageview. With more ways of viewing content, it is going > to get > harder and harder to maintain a pattern based definition. A > pageview should > be an event that is logged, not something that is pattern matched > out of a > very noisy stream of data. > > Most mediawiki requests do this now, via the page_id field in the > X-Analytlics header, but we can’t use this for all pageviews > because APIs > are more complicated (e.g. more than one page can be served in a > single > request, etc.). In the longterm, there should be a pageview event > stream > just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
> > -Ao > > > >> On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org >> wrote: >> >> On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org >> wrote: >>> This discussion is about needed updates of the definition and >>> Analytics >>> implementation for mobile apps page view metrics. There is also >>> an >>> associated Phab task[4]. Please add the proper Analytics project >>> there. >>> >>> Background / Changes >>> >>> As you probably remember, the Android app splits a page view into >>> two >>> requests: one for the lead section and metadata, plus another one >>> for >>> the >>> remainder. >>> >>> The mobile apps are going to change the way they load pages in >>> two >>> different >>> ways: >>> >>> We'll add a link preview when someone clicks on a link from a >>> page. >>> We're planning on switching over the using RESTBase for loading >>> pages >>> and >>> also the link preview (initially just the Android beta, ater >>> more) >>> >> >> Woah woah woah woah woah. By RESTBase do you mean Gabriel's >> RESTful >> service API? >> >> Last time I checked that wasn't even consumed by HDFS. Is it now >> being >> consumed by HDFS? >> >> More importantly the actual URLs are going to look /totally/ >> different. If we do not include RESTBase requests, we will miss >> the >> apps. If we /do/ include RESTBase requests we will not only have >> to >> rewrite the pageview definition for the apps to recognise the new >> URL >> scheme, we will also potentially have to rewrite every /other/ bit >> of >> the definition to /not/ incorporate those requests. >> >> (I use "we" in a collective sense. This isn't my baby any more, >> although if Joseph et al want help with the refactor here I'm >> happy to >> spend my volunteer time on it). >> >> But basically every other bit of your email is important but now >> secondary: this is a potentially massive change, all on its own, >> even >> without the link preview, even if the substance of the requests >> going >> to RESTBase were identical. >> >>> This will have implications for the pageviews definition and how >>> we >>> count >>> user engagement. >>> >>> The big question is >>> >>> Should we count link previews as a page view since it's an >>> indication >>> of >>> user engagement? Or should there be a separate metric for link >>> previews? >>> >>> Counting page views >>> >>> IIRC we currently count action=mobileview§ions=0 query >>> parameters >>> of >>> api.php as a page view. When we publish link previews for all >>> Android >>> app >>> users then we would either want to count also the calls to >>> action=query&prop=extracts as a page view or add them to another >>> metric. >>> >>> Once the apps use RESTBase the HTTPS requests will be very >>> different: >>> >>> Page view: Instead of action=mobileview§ions=0 the app would >>> call >>> the >>> RESTBase endpoint for lead request[1] instead of the PHP API >>> mentioned >>> above. Then it would call [2]. >>> Link preview: Instead of action=query&prop=extracts it would call >>> the >>> lead >>> request[1], too, since there is a lot of overlap. At least that >>> our >>> current >>> plan. The advantage of that is that the client doesn't need to >>> execute the >>> lead request a second time if the user clicks on the link preview >>> (-- >>> either >>> through caching or app logic.) >>> >>> So, in the RESTBase case we either want to count the >>> mobile-html-sections-lead requests or the >>> mobile-html-sections-remaining >>> requests depending on what our definition for page views actually >>> is. >>> We >>> could also add a query parameter or extra HTTP header to one of >>> the >>> mobile-html-sections-lead requests if we need to distinguish >>> between >>> previews and page views. >>> >>> Both the current PHP API and the RESTBase based metrics would >>> need to >>> be >>> compatible and be collected in parallel since we cannot control >>> when >>> users >>> update their apps. >>> >>> [1] >>> >>> >>> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert >>> [2] >>> >>> >>> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... >>> [3] >>> >>> >>> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app... >>> >>> [4] https://phabricator.wikimedia.org/T109383 >>> >>> >>> Cheers, >>> >>> Bernd >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Yeah, doing this on the client could work, but would require *all* clients to actually do it. We also have metrics per entry point in RESTBase, but those are behind Varnishes and will only count Varnish cache misses. Without Varnish caching, this would be a solved problem ;)
On Wed, Aug 19, 2015 at 7:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag
or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the
X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org
wrote:
This discussion is about needed updates of the definition and
Analytics
implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project
there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for
the
remainder.
The mobile apps are going to change the way they load pages in two
different
ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages
and
also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful
service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we
count
user engagement.
The big question is
Should we count link previews as a page view since it's an indication
of
user engagement? Or should there be a separate metric for link
previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters
of
api.php as a page view. When we publish link previews for all Android
app
users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another
metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call
the
RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the
lead
request[1], too, since there is a lot of overlap. At least that our
current
plan. The advantage of that is that the client doesn't need to
execute the
lead request a second time if the user clicks on the link preview (--
either
through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the
mobile-html-sections-remaining
requests depending on what our definition for page views actually is.
We
could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to
be
compatible and be collected in parallel since we cannot control when
users
update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
[2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
[3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
In the absence of all clients doing it, "if it has this x_analytics entry, don't bother with the complex regular expressions, if it doesn't, do" still works.
On 19 August 2015 at 13:34, Gabriel Wicke gwicke@wikimedia.org wrote:
Yeah, doing this on the client could work, but would require *all* clients to actually do it. We also have metrics per entry point in RESTBase, but those are behind Varnishes and will only count Varnish cache misses. Without Varnish caching, this would be a solved problem ;)
On Wed, Aug 19, 2015 at 7:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... [3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Oliver, the problem with "page_title OR page_id" instead of "always page_title and page_id if you have it" is what Andrew was addressing above. It means we have to query for page_title by id, and that means we need to keep an up-to-date copy of all mediawiki databases. And we have to be able to query that copy tens of thousands of times per second, which is basically not going to happen.
We just chatted in scrum of scrums about this, it looks like Adam's going to set up a meeting so we can talk more there. I agree with Adam that we have to have a short term solution for counting the new kinds of requests. A medium term solution so that we don't all go insane, and something to shoot for in the long term.
On Wed, Aug 19, 2015 at 1:48 PM, Oliver Keyes okeyes@wikimedia.org wrote:
In the absence of all clients doing it, "if it has this x_analytics entry, don't bother with the complex regular expressions, if it doesn't, do" still works.
On 19 August 2015 at 13:34, Gabriel Wicke gwicke@wikimedia.org wrote:
Yeah, doing this on the client could work, but would require *all*
clients
to actually do it. We also have metrics per entry point in RESTBase, but those are behind Varnishes and will only count Varnish cache misses.
Without
Varnish caching, this would be a solved problem ;)
On Wed, Aug 19, 2015 at 7:53 AM, Dan Andreescu <dandreescu@wikimedia.org
wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new
URL
scheme
I really think that apps and APIs should do something proactive to
tag
or log a pageview. With more ways of viewing content, it is going
to get
harder and harder to maintain a pattern based definition. A
pageview should
be an event that is logged, not something that is pattern matched
out of a
very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because
APIs
are more complicated (e.g. more than one page can be served in a
single
request, etc.). In the longterm, there should be a pageview event
stream
just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org
wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote: > This discussion is about needed updates of the definition and > Analytics > implementation for mobile apps page view metrics. There is also an > associated Phab task[4]. Please add the proper Analytics project > there. > > Background / Changes > > As you probably remember, the Android app splits a page view into
two
> requests: one for the lead section and metadata, plus another one
for
> the > remainder. > > The mobile apps are going to change the way they load pages in two > different > ways: > > We'll add a link preview when someone clicks on a link from a page. > We're planning on switching over the using RESTBase for loading
pages
> and > also the link preview (initially just the Android beta, ater more) >
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now
being
consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new
URL
scheme, we will also potentially have to rewrite every /other/ bit
of
the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy
to
spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own,
even
without the link preview, even if the substance of the requests
going
to RESTBase were identical.
> This will have implications for the pageviews definition and how we > count > user engagement. > > The big question is > > Should we count link previews as a page view since it's an
indication
> of > user engagement? Or should there be a separate metric for link > previews? > > Counting page views > > IIRC we currently count action=mobileview§ions=0 query
parameters
> of > api.php as a page view. When we publish link previews for all
Android
> app > users then we would either want to count also the calls to > action=query&prop=extracts as a page view or add them to another > metric. > > Once the apps use RESTBase the HTTPS requests will be very
different:
> > Page view: Instead of action=mobileview§ions=0 the app would
call
> the > RESTBase endpoint for lead request[1] instead of the PHP API > mentioned > above. Then it would call [2]. > Link preview: Instead of action=query&prop=extracts it would call
the
> lead > request[1], too, since there is a lot of overlap. At least that our > current > plan. The advantage of that is that the client doesn't need to > execute the > lead request a second time if the user clicks on the link preview
(--
> either > through caching or app logic.) > > So, in the RESTBase case we either want to count the > mobile-html-sections-lead requests or the > mobile-html-sections-remaining > requests depending on what our definition for page views actually
is.
> We > could also add a query parameter or extra HTTP header to one of the > mobile-html-sections-lead requests if we need to distinguish
between
> previews and page views. > > Both the current PHP API and the RESTBase based metrics would need
to
> be > compatible and be collected in parallel since we cannot control
when
> users > update their apps. > > [1] > >
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
> [2] > >
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
> [3] > >
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
> > [4] https://phabricator.wikimedia.org/T109383 > > > Cheers, > > Bernd > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Aren't we currently just storing pageID?
On 19 August 2015 at 14:11, Dan Andreescu dandreescu@wikimedia.org wrote:
Oliver, the problem with "page_title OR page_id" instead of "always page_title and page_id if you have it" is what Andrew was addressing above. It means we have to query for page_title by id, and that means we need to keep an up-to-date copy of all mediawiki databases. And we have to be able to query that copy tens of thousands of times per second, which is basically not going to happen.
We just chatted in scrum of scrums about this, it looks like Adam's going to set up a meeting so we can talk more there. I agree with Adam that we have to have a short term solution for counting the new kinds of requests. A medium term solution so that we don't all go insane, and something to shoot for in the long term.
On Wed, Aug 19, 2015 at 1:48 PM, Oliver Keyes okeyes@wikimedia.org wrote:
In the absence of all clients doing it, "if it has this x_analytics entry, don't bother with the complex regular expressions, if it doesn't, do" still works.
On 19 August 2015 at 13:34, Gabriel Wicke gwicke@wikimedia.org wrote:
Yeah, doing this on the client could work, but would require *all* clients to actually do it. We also have metrics per entry point in RESTBase, but those are behind Varnishes and will only count Varnish cache misses. Without Varnish caching, this would be a solved problem ;)
On Wed, Aug 19, 2015 at 7:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should follow through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count as a pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes okeyes@wikimedia.org wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org wrote:
> If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new > URL > scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this for kind of a while, so...
-Ao
> On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org > wrote: > > On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org > wrote: >> This discussion is about needed updates of the definition and >> Analytics >> implementation for mobile apps page view metrics. There is also an >> associated Phab task[4]. Please add the proper Analytics project >> there. >> >> Background / Changes >> >> As you probably remember, the Android app splits a page view into >> two >> requests: one for the lead section and metadata, plus another one >> for >> the >> remainder. >> >> The mobile apps are going to change the way they load pages in two >> different >> ways: >> >> We'll add a link preview when someone clicks on a link from a >> page. >> We're planning on switching over the using RESTBase for loading >> pages >> and >> also the link preview (initially just the Android beta, ater more) >> > > Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful > service API? > > Last time I checked that wasn't even consumed by HDFS. Is it now > being > consumed by HDFS? > > More importantly the actual URLs are going to look /totally/ > different. If we do not include RESTBase requests, we will miss the > apps. If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new > URL > scheme, we will also potentially have to rewrite every /other/ bit > of > the definition to /not/ incorporate those requests. > > (I use "we" in a collective sense. This isn't my baby any more, > although if Joseph et al want help with the refactor here I'm happy > to > spend my volunteer time on it). > > But basically every other bit of your email is important but now > secondary: this is a potentially massive change, all on its own, > even > without the link preview, even if the substance of the requests > going > to RESTBase were identical. > >> This will have implications for the pageviews definition and how >> we >> count >> user engagement. >> >> The big question is >> >> Should we count link previews as a page view since it's an >> indication >> of >> user engagement? Or should there be a separate metric for link >> previews? >> >> Counting page views >> >> IIRC we currently count action=mobileview§ions=0 query >> parameters >> of >> api.php as a page view. When we publish link previews for all >> Android >> app >> users then we would either want to count also the calls to >> action=query&prop=extracts as a page view or add them to another >> metric. >> >> Once the apps use RESTBase the HTTPS requests will be very >> different: >> >> Page view: Instead of action=mobileview§ions=0 the app would >> call >> the >> RESTBase endpoint for lead request[1] instead of the PHP API >> mentioned >> above. Then it would call [2]. >> Link preview: Instead of action=query&prop=extracts it would call >> the >> lead >> request[1], too, since there is a lot of overlap. At least that >> our >> current >> plan. The advantage of that is that the client doesn't need to >> execute the >> lead request a second time if the user clicks on the link preview >> (-- >> either >> through caching or app logic.) >> >> So, in the RESTBase case we either want to count the >> mobile-html-sections-lead requests or the >> mobile-html-sections-remaining >> requests depending on what our definition for page views actually >> is. >> We >> could also add a query parameter or extra HTTP header to one of >> the >> mobile-html-sections-lead requests if we need to distinguish >> between >> previews and page views. >> >> Both the current PHP API and the RESTBase based metrics would need >> to >> be >> compatible and be collected in parallel since we cannot control >> when >> users >> update their apps. >> >> [1] >> >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert >> [2] >> >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... >> [3] >> >> >> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app... >> >> [4] https://phabricator.wikimedia.org/T109383 >> >> >> Cheers, >> >> Bernd >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
In some cases we have page_id, in other cases we have nothing (like API requests)
On Wed, Aug 19, 2015 at 2:13 PM, Oliver Keyes okeyes@wikimedia.org wrote:
Aren't we currently just storing pageID?
On 19 August 2015 at 14:11, Dan Andreescu dandreescu@wikimedia.org wrote:
Oliver, the problem with "page_title OR page_id" instead of "always page_title and page_id if you have it" is what Andrew was addressing
above.
It means we have to query for page_title by id, and that means we need to keep an up-to-date copy of all mediawiki databases. And we have to be
able
to query that copy tens of thousands of times per second, which is
basically
not going to happen.
We just chatted in scrum of scrums about this, it looks like Adam's
going to
set up a meeting so we can talk more there. I agree with Adam that we
have
to have a short term solution for counting the new kinds of requests. A medium term solution so that we don't all go insane, and something to
shoot
for in the long term.
On Wed, Aug 19, 2015 at 1:48 PM, Oliver Keyes okeyes@wikimedia.org
wrote:
In the absence of all clients doing it, "if it has this x_analytics entry, don't bother with the complex regular expressions, if it doesn't, do" still works.
On 19 August 2015 at 13:34, Gabriel Wicke gwicke@wikimedia.org wrote:
Yeah, doing this on the client could work, but would require *all* clients to actually do it. We also have metrics per entry point in RESTBase,
but
those are behind Varnishes and will only count Varnish cache misses. Without Varnish caching, this would be a solved problem ;)
On Wed, Aug 19, 2015 at 7:53 AM, Dan Andreescu dandreescu@wikimedia.org wrote:
This (making pageviews proactive) is a great idea, and we should
follow
through. Here's a simple start:
If your app/site/etc. is creating a request that it wants to count
as a
pageview, add an X-Analytics header with pageview_id=<page_id> or pageview_title=<page_title>
If we can make this change uniformly, I think we'd be in a very good place.
On Wed, Aug 19, 2015 at 10:23 AM, Oliver Keyes <okeyes@wikimedia.org
wrote:
On 19 August 2015 at 10:19, Andrew Otto aotto@wikimedia.org
wrote:
>> If we /do/ include RESTBase requests we will not only have to >> rewrite the pageview definition for the apps to recognise the new >> URL >> scheme > > I really think that apps and APIs should do something proactive to > tag > or log a pageview. With more ways of viewing content, it is going > to get > harder and harder to maintain a pattern based definition. A > pageview should > be an event that is logged, not something that is pattern matched > out of a > very noisy stream of data. > > Most mediawiki requests do this now, via the page_id field in the > X-Analytlics header, but we can’t use this for all pageviews
because
> APIs > are more complicated (e.g. more than one page can be served in a > single > request, etc.). In the longterm, there should be a pageview event > stream > just like rcstream! :)
This is an excellent point. IIRC we'd been asking Apps to do this
for
kind of a while, so...
> > -Ao > > > >> On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org >> wrote: >> >> On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org >> wrote: >>> This discussion is about needed updates of the definition and >>> Analytics >>> implementation for mobile apps page view metrics. There is also
an
>>> associated Phab task[4]. Please add the proper Analytics project >>> there. >>> >>> Background / Changes >>> >>> As you probably remember, the Android app splits a page view
into
>>> two >>> requests: one for the lead section and metadata, plus another
one
>>> for >>> the >>> remainder. >>> >>> The mobile apps are going to change the way they load pages in
two
>>> different >>> ways: >>> >>> We'll add a link preview when someone clicks on a link from a >>> page. >>> We're planning on switching over the using RESTBase for loading >>> pages >>> and >>> also the link preview (initially just the Android beta, ater
more)
>>> >> >> Woah woah woah woah woah. By RESTBase do you mean Gabriel's
RESTful
>> service API? >> >> Last time I checked that wasn't even consumed by HDFS. Is it now >> being >> consumed by HDFS? >> >> More importantly the actual URLs are going to look /totally/ >> different. If we do not include RESTBase requests, we will miss
the
>> apps. If we /do/ include RESTBase requests we will not only have
to
>> rewrite the pageview definition for the apps to recognise the new >> URL >> scheme, we will also potentially have to rewrite every /other/
bit
>> of >> the definition to /not/ incorporate those requests. >> >> (I use "we" in a collective sense. This isn't my baby any more, >> although if Joseph et al want help with the refactor here I'm
happy
>> to >> spend my volunteer time on it). >> >> But basically every other bit of your email is important but now >> secondary: this is a potentially massive change, all on its own, >> even >> without the link preview, even if the substance of the requests >> going >> to RESTBase were identical. >> >>> This will have implications for the pageviews definition and how >>> we >>> count >>> user engagement. >>> >>> The big question is >>> >>> Should we count link previews as a page view since it's an >>> indication >>> of >>> user engagement? Or should there be a separate metric for link >>> previews? >>> >>> Counting page views >>> >>> IIRC we currently count action=mobileview§ions=0 query >>> parameters >>> of >>> api.php as a page view. When we publish link previews for all >>> Android >>> app >>> users then we would either want to count also the calls to >>> action=query&prop=extracts as a page view or add them to another >>> metric. >>> >>> Once the apps use RESTBase the HTTPS requests will be very >>> different: >>> >>> Page view: Instead of action=mobileview§ions=0 the app would >>> call >>> the >>> RESTBase endpoint for lead request[1] instead of the PHP API >>> mentioned >>> above. Then it would call [2]. >>> Link preview: Instead of action=query&prop=extracts it would
call
>>> the >>> lead >>> request[1], too, since there is a lot of overlap. At least that >>> our >>> current >>> plan. The advantage of that is that the client doesn't need to >>> execute the >>> lead request a second time if the user clicks on the link
preview
>>> (-- >>> either >>> through caching or app logic.) >>> >>> So, in the RESTBase case we either want to count the >>> mobile-html-sections-lead requests or the >>> mobile-html-sections-remaining >>> requests depending on what our definition for page views
actually
>>> is. >>> We >>> could also add a query parameter or extra HTTP header to one of >>> the >>> mobile-html-sections-lead requests if we need to distinguish >>> between >>> previews and page views. >>> >>> Both the current PHP API and the RESTBase based metrics would
need
>>> to >>> be >>> compatible and be collected in parallel since we cannot control >>> when >>> users >>> update their apps. >>> >>> [1] >>> >>> >>>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
>>> [2] >>> >>> >>>
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
>>> [3] >>> >>> >>>
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
>>> >>> [4] https://phabricator.wikimedia.org/T109383 >>> >>> >>> Cheers, >>> >>> Bernd >>> >>> >>> _______________________________________________ >>> Analytics mailing list >>> Analytics@lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/analytics >>> >> >> >> >> -- >> Oliver Keyes >> Count Logula >> Wikimedia Foundation >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
With more ways of viewing content, it is going to get harder and harder to
maintain a pattern based definition. Indeed, we want to move away from pattern based definition as mach as possible.
This is an FYI to everyone that with our latest changes (that we are in the process of deploying today) if a request comes "tagged" with "preview" in the x-analytics header it will not be counted towards a pageviews. The Android App should do corresponding changes to add the tag "preview" to its preview requests.
X-analytics header is documented here: https://wikitech.wikimedia.org/wiki/X-Analytics
On Wed, Aug 19, 2015 at 7:19 AM, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for
the
remainder.
The mobile apps are going to change the way they load pages in two
different
ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages
and
also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful
service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we
count
user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android
app
users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call
the
RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the
lead
request[1], too, since there is a lot of overlap. At least that our
current
plan. The advantage of that is that the client doesn't need to execute
the
lead request a second time if the user clicks on the link preview (--
either
through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when
users
update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
[2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
[3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Have those changes been noted on the main pageview definition page and associated changelog?
On 17 September 2015 at 09:58, Nuria Ruiz nuria@wikimedia.org wrote:
With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition.
Indeed, we want to move away from pattern based definition as mach as possible.
This is an FYI to everyone that with our latest changes (that we are in the process of deploying today) if a request comes "tagged" with "preview" in the x-analytics header it will not be counted towards a pageviews. The Android App should do corresponding changes to add the tag "preview" to its preview requests.
X-analytics header is documented here: https://wikitech.wikimedia.org/wiki/X-Analytics
On Wed, Aug 19, 2015 at 7:19 AM, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... [3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Right! Thanks for pointing that out.
I think I have updated all docs now: https://meta.wikimedia.org/wiki/Research:Page_view#Change_log
https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
On Thu, Sep 17, 2015 at 7:36 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Have those changes been noted on the main pageview definition page and associated changelog?
On 17 September 2015 at 09:58, Nuria Ruiz nuria@wikimedia.org wrote:
With more ways of viewing content, it is going to get harder and harder
to
maintain a pattern based definition.
Indeed, we want to move away from pattern based definition as mach as possible.
This is an FYI to everyone that with our latest changes (that we are in
the
process of deploying today) if a request comes "tagged" with "preview" in the x-analytics header it will not be counted towards a pageviews. The Android App should do corresponding changes to add the tag "preview" to
its
preview requests.
X-analytics header is documented here: https://wikitech.wikimedia.org/wiki/X-Analytics
On Wed, Aug 19, 2015 at 7:19 AM, Andrew Otto aotto@wikimedia.org
wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag
or
log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview
should
be an event that is logged, not something that is pattern matched out
of a
very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because
APIs
are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event
stream
just like rcstream! :)
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org
wrote:
This discussion is about needed updates of the definition and
Analytics
implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project
there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API
mentioned
above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to
execute
the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
[2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
[3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Danke!
On 17 September 2015 at 11:15, Nuria Ruiz nuria@wikimedia.org wrote:
Right! Thanks for pointing that out.
I think I have updated all docs now: https://meta.wikimedia.org/wiki/Research:Page_view#Change_log
https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
On Thu, Sep 17, 2015 at 7:36 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Have those changes been noted on the main pageview definition page and associated changelog?
On 17 September 2015 at 09:58, Nuria Ruiz nuria@wikimedia.org wrote:
With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition.
Indeed, we want to move away from pattern based definition as mach as possible.
This is an FYI to everyone that with our latest changes (that we are in the process of deploying today) if a request comes "tagged" with "preview" in the x-analytics header it will not be counted towards a pageviews. The Android App should do corresponding changes to add the tag "preview" to its preview requests.
X-analytics header is documented here: https://wikitech.wikimedia.org/wiki/X-Analytics
On Wed, Aug 19, 2015 at 7:19 AM, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote:
This discussion is about needed updates of the definition and Analytics implementation for mobile apps page view metrics. There is also an associated Phab task[4]. Please add the proper Analytics project there.
Background / Changes
As you probably remember, the Android app splits a page view into two requests: one for the lead section and metadata, plus another one for the remainder.
The mobile apps are going to change the way they load pages in two different ways:
We'll add a link preview when someone clicks on a link from a page. We're planning on switching over the using RESTBase for loading pages and also the link preview (initially just the Android beta, ater more)
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new URL scheme, we will also potentially have to rewrite every /other/ bit of the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own, even without the link preview, even if the substance of the requests going to RESTBase were identical.
This will have implications for the pageviews definition and how we count user engagement.
The big question is
Should we count link previews as a page view since it's an indication of user engagement? Or should there be a separate metric for link previews?
Counting page views
IIRC we currently count action=mobileview§ions=0 query parameters of api.php as a page view. When we publish link previews for all Android app users then we would either want to count also the calls to action=query&prop=extracts as a page view or add them to another metric.
Once the apps use RESTBase the HTTPS requests will be very different:
Page view: Instead of action=mobileview§ions=0 the app would call the RESTBase endpoint for lead request[1] instead of the PHP API mentioned above. Then it would call [2]. Link preview: Instead of action=query&prop=extracts it would call the lead request[1], too, since there is a lot of overlap. At least that our current plan. The advantage of that is that the client doesn't need to execute the lead request a second time if the user clicks on the link preview (-- either through caching or app logic.)
So, in the RESTBase case we either want to count the mobile-html-sections-lead requests or the mobile-html-sections-remaining requests depending on what our definition for page views actually is. We could also add a query parameter or extra HTTP header to one of the mobile-html-sections-lead requests if we need to distinguish between previews and page views.
Both the current PHP API and the RESTBase based metrics would need to be compatible and be collected in parallel since we cannot control when users update their apps.
[1]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert [2]
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... [3]
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
[4] https://phabricator.wikimedia.org/T109383
Cheers,
Bernd
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
This discussion also reminds me of the idea of tracking time spent on site. Arguably, that's a more relevant measurement for how much of our content people actually consume, and it also neatly side-steps issues like the categorization of link previews. I realize that measuring that accurately can be challenging, but I think it'll become more and more important as we venture into more dynamic content experiences.
On Thu, Sep 17, 2015 at 8:17 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Danke!
On 17 September 2015 at 11:15, Nuria Ruiz nuria@wikimedia.org wrote:
Right! Thanks for pointing that out.
I think I have updated all docs now: https://meta.wikimedia.org/wiki/Research:Page_view#Change_log
https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
On Thu, Sep 17, 2015 at 7:36 AM, Oliver Keyes okeyes@wikimedia.org
wrote:
Have those changes been noted on the main pageview definition page and associated changelog?
On 17 September 2015 at 09:58, Nuria Ruiz nuria@wikimedia.org wrote:
With more ways of viewing content, it is going to get harder and
harder
to maintain a pattern based definition.
Indeed, we want to move away from pattern based definition as mach as possible.
This is an FYI to everyone that with our latest changes (that we are
in
the process of deploying today) if a request comes "tagged" with "preview" in the x-analytics header it will not be counted towards a pageviews. The Android App should do corresponding changes to add the tag "preview"
to
its preview requests.
X-analytics header is documented here: https://wikitech.wikimedia.org/wiki/X-Analytics
On Wed, Aug 19, 2015 at 7:19 AM, Andrew Otto aotto@wikimedia.org wrote:
If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new
URL
scheme
I really think that apps and APIs should do something proactive to
tag
or log a pageview. With more ways of viewing content, it is going to
get
harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a
single
request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
-Ao
On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org
wrote:
On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org wrote: > This discussion is about needed updates of the definition and > Analytics > implementation for mobile apps page view metrics. There is also an > associated Phab task[4]. Please add the proper Analytics project > there. > > Background / Changes > > As you probably remember, the Android app splits a page view into > two > requests: one for the lead section and metadata, plus another one > for > the > remainder. > > The mobile apps are going to change the way they load pages in two > different > ways: > > We'll add a link preview when someone clicks on a link from a
page.
> We're planning on switching over the using RESTBase for loading > pages > and > also the link preview (initially just the Android beta, ater more) >
Woah woah woah woah woah. By RESTBase do you mean Gabriel's RESTful service API?
Last time I checked that wasn't even consumed by HDFS. Is it now being consumed by HDFS?
More importantly the actual URLs are going to look /totally/ different. If we do not include RESTBase requests, we will miss the apps. If we /do/ include RESTBase requests we will not only have to rewrite the pageview definition for the apps to recognise the new
URL
scheme, we will also potentially have to rewrite every /other/ bit
of
the definition to /not/ incorporate those requests.
(I use "we" in a collective sense. This isn't my baby any more, although if Joseph et al want help with the refactor here I'm happy to spend my volunteer time on it).
But basically every other bit of your email is important but now secondary: this is a potentially massive change, all on its own,
even
without the link preview, even if the substance of the requests
going
to RESTBase were identical.
> This will have implications for the pageviews definition and how
we
> count > user engagement. > > The big question is > > Should we count link previews as a page view since it's an > indication > of > user engagement? Or should there be a separate metric for link > previews? > > Counting page views > > IIRC we currently count action=mobileview§ions=0 query > parameters > of > api.php as a page view. When we publish link previews for all > Android > app > users then we would either want to count also the calls to > action=query&prop=extracts as a page view or add them to another > metric. > > Once the apps use RESTBase the HTTPS requests will be very > different: > > Page view: Instead of action=mobileview§ions=0 the app would > call > the > RESTBase endpoint for lead request[1] instead of the PHP API > mentioned > above. Then it would call [2]. > Link preview: Instead of action=query&prop=extracts it would call > the > lead > request[1], too, since there is a lot of overlap. At least that
our
> current > plan. The advantage of that is that the client doesn't need to > execute > the > lead request a second time if the user clicks on the link preview > (-- > either > through caching or app logic.) > > So, in the RESTBase case we either want to count the > mobile-html-sections-lead requests or the > mobile-html-sections-remaining > requests depending on what our definition for page views actually > is. > We > could also add a query parameter or extra HTTP header to one of
the
> mobile-html-sections-lead requests if we need to distinguish
between
> previews and page views. > > Both the current PHP API and the RESTBase based metrics would need > to > be > compatible and be collected in parallel since we cannot control
when
> users > update their apps. > > [1] > > >
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert
> [2] > > >
https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil...
> [3] > > >
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app...
> > [4] https://phabricator.wikimedia.org/T109383 > > > Cheers, > > Bernd > > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics >
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
It's not that challenging; Aaron and I developed a fairly robust way of doing it that Mikhail and I are refining. It's just not easy to do without, say, a dedicated EL schema that somebody (probably readership?) would own and surface data from.
On 18 September 2015 at 13:14, Gabriel Wicke gwicke@wikimedia.org wrote:
This discussion also reminds me of the idea of tracking time spent on site. Arguably, that's a more relevant measurement for how much of our content people actually consume, and it also neatly side-steps issues like the categorization of link previews. I realize that measuring that accurately can be challenging, but I think it'll become more and more important as we venture into more dynamic content experiences.
On Thu, Sep 17, 2015 at 8:17 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Danke!
On 17 September 2015 at 11:15, Nuria Ruiz nuria@wikimedia.org wrote:
Right! Thanks for pointing that out.
I think I have updated all docs now: https://meta.wikimedia.org/wiki/Research:Page_view#Change_log
https://meta.wikimedia.org/wiki/Research:Page_view/Generalised_filters
On Thu, Sep 17, 2015 at 7:36 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Have those changes been noted on the main pageview definition page and associated changelog?
On 17 September 2015 at 09:58, Nuria Ruiz nuria@wikimedia.org wrote:
With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition.
Indeed, we want to move away from pattern based definition as mach as possible.
This is an FYI to everyone that with our latest changes (that we are in the process of deploying today) if a request comes "tagged" with "preview" in the x-analytics header it will not be counted towards a pageviews. The Android App should do corresponding changes to add the tag "preview" to its preview requests.
X-analytics header is documented here: https://wikitech.wikimedia.org/wiki/X-Analytics
On Wed, Aug 19, 2015 at 7:19 AM, Andrew Otto aotto@wikimedia.org wrote:
> If we /do/ include RESTBase requests we will not only have to > rewrite the pageview definition for the apps to recognise the new > URL > scheme
I really think that apps and APIs should do something proactive to tag or log a pageview. With more ways of viewing content, it is going to get harder and harder to maintain a pattern based definition. A pageview should be an event that is logged, not something that is pattern matched out of a very noisy stream of data.
Most mediawiki requests do this now, via the page_id field in the X-Analytlics header, but we can’t use this for all pageviews because APIs are more complicated (e.g. more than one page can be served in a single request, etc.). In the longterm, there should be a pageview event stream just like rcstream! :)
-Ao
> On Aug 18, 2015, at 19:58, Oliver Keyes okeyes@wikimedia.org > wrote: > > On 18 August 2015 at 19:11, Bernd Sitzmann bernd@wikimedia.org > wrote: >> This discussion is about needed updates of the definition and >> Analytics >> implementation for mobile apps page view metrics. There is also >> an >> associated Phab task[4]. Please add the proper Analytics project >> there. >> >> Background / Changes >> >> As you probably remember, the Android app splits a page view into >> two >> requests: one for the lead section and metadata, plus another one >> for >> the >> remainder. >> >> The mobile apps are going to change the way they load pages in >> two >> different >> ways: >> >> We'll add a link preview when someone clicks on a link from a >> page. >> We're planning on switching over the using RESTBase for loading >> pages >> and >> also the link preview (initially just the Android beta, ater >> more) >> > > Woah woah woah woah woah. By RESTBase do you mean Gabriel's > RESTful > service API? > > Last time I checked that wasn't even consumed by HDFS. Is it now > being > consumed by HDFS? > > More importantly the actual URLs are going to look /totally/ > different. If we do not include RESTBase requests, we will miss > the > apps. If we /do/ include RESTBase requests we will not only have > to > rewrite the pageview definition for the apps to recognise the new > URL > scheme, we will also potentially have to rewrite every /other/ bit > of > the definition to /not/ incorporate those requests. > > (I use "we" in a collective sense. This isn't my baby any more, > although if Joseph et al want help with the refactor here I'm > happy > to > spend my volunteer time on it). > > But basically every other bit of your email is important but now > secondary: this is a potentially massive change, all on its own, > even > without the link preview, even if the substance of the requests > going > to RESTBase were identical. > >> This will have implications for the pageviews definition and how >> we >> count >> user engagement. >> >> The big question is >> >> Should we count link previews as a page view since it's an >> indication >> of >> user engagement? Or should there be a separate metric for link >> previews? >> >> Counting page views >> >> IIRC we currently count action=mobileview§ions=0 query >> parameters >> of >> api.php as a page view. When we publish link previews for all >> Android >> app >> users then we would either want to count also the calls to >> action=query&prop=extracts as a page view or add them to another >> metric. >> >> Once the apps use RESTBase the HTTPS requests will be very >> different: >> >> Page view: Instead of action=mobileview§ions=0 the app would >> call >> the >> RESTBase endpoint for lead request[1] instead of the PHP API >> mentioned >> above. Then it would call [2]. >> Link preview: Instead of action=query&prop=extracts it would call >> the >> lead >> request[1], too, since there is a lot of overlap. At least that >> our >> current >> plan. The advantage of that is that the client doesn't need to >> execute >> the >> lead request a second time if the user clicks on the link preview >> (-- >> either >> through caching or app logic.) >> >> So, in the RESTBase case we either want to count the >> mobile-html-sections-lead requests or the >> mobile-html-sections-remaining >> requests depending on what our definition for page views actually >> is. >> We >> could also add a query parameter or extra HTTP header to one of >> the >> mobile-html-sections-lead requests if we need to distinguish >> between >> previews and page views. >> >> Both the current PHP API and the RESTBase based metrics would >> need >> to >> be >> compatible and be collected in parallel since we cannot control >> when >> users >> update their apps. >> >> [1] >> >> >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Dilbert >> [2] >> >> >> >> https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-remaining/Dil... >> [3] >> >> >> >> https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/RESTBase_services_for_app... >> >> [4] https://phabricator.wikimedia.org/T109383 >> >> >> Cheers, >> >> Bernd >> >> >> _______________________________________________ >> Analytics mailing list >> Analytics@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/analytics >> > > > > -- > Oliver Keyes > Count Logula > Wikimedia Foundation > > _______________________________________________ > Analytics mailing list > Analytics@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Count Logula Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics