Fwd: RFC: Introducing two new HTTP headers to track mobile pageviews

List overview All Threads
Download

newer

older

Re: [Wikitech-l]...

Merge Vector extension into core

Diederik van Liere

31 Jan 2013 31 Jan '13

7:30 p.m.

(Apologies for cross-posting)

Heya,

The mobile team needs accurate pageviews for the alpha and beta mobile site. Currently, this information is only stored in a cookie, but we don't want to go the route of starting to store this cookie because of cache server performance, network performance and privacy policy issues. The mobile team also needs to be able to diferentiate between initial and secondary API requests - pages in the beta version of MobileFrontend are dynamically loaded via the API, meaning that MobileFrontend will might make multiple API requests to load sections of an article when they are toggled open up by the user. At the moment, we have no way of diferentiating between API requests to determine which one should count as a 'pageview'.

We propose that we set two additional custom HTTP headers - one to identify alpha/beta/stable version of MobileFrontend, the other to be able to diferentiate between initial and secondary API requests. This would make logging the necessary information trivial, and we believe it would be fairly lightweight to implement.

We propose the following two headers with their possible values: X-MF-Mode: a/b/s (alpha/beta/stable) X-MF-Req: 1/2 (primary/secondary)

X-MF-Mode would be determined by Varnish based off the existence of the alpha/beta identifying cookies while X-MF-Req would be set by MobileFrontend in the backend response.

These headers would only be set on the Varnish servers, on the Squids/Nginx we will just set a dash ('-') in the log fields.

Questions: 1) Are there objections to the introduction of these two http headers? 2) We would like to aim for a late February deployment, is that an okay period? (We will announce the real deployment date as well) 3) Are we missing anything important?

Thanks for your feedback!

Best Arthur & Diederik

Show replies by date

Platonides

2 Feb 2 Feb

11:36 p.m.

New subject: Fwd: RFC: Introducing two new HTTP headers to track mobile pageviews

I don't like it's cryptic nature.

Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».

Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary

But that also means sending more bytes through the wire :S

Ori Livneh

11:55 p.m.

New subject: Fwd: RFC: Introducing two new HTTP headers to track mobile pageviews

On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:

...

I don't like it's cryptic nature.

Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».

Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary

But that also means sending more bytes through the wire :S

Well, you can (and should) drop the 'X-' :-)

See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix and Similar Constructs in Application Protocols

-- Ori Livneh

Diederik van Liere

3 Feb 3 Feb

12:16 a.m.

New subject: Fwd: RFC: Introducing two new HTTP headers to track mobile pageviews

Thanks Ori, I was not aware of this D

Sent from my iPhone

On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:

...

On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:

...
I don't like it's cryptic nature.

Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».

Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary

But that also means sending more bytes through the wire :S

Well, you can (and should) drop the 'X-' :-)

See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix and Similar Constructs in Application Protocols

-- Ori Livneh

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

David Schoonover

2:08 a.m.

New subject: Fwd: RFC: Introducing two new HTTP headers to track mobile pageviews

Huh! News to me as well. I definitely agree with that decision. Thanks, Ori!

I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode?

Looking especially to hear from Arthur and Matt.

-- David Schoonover dsc@wikimedia.org

On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:

...

Thanks Ori, I was not aware of this D

Sent from my iPhone

On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:

...
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:

...
I don't like it's cryptic nature.

Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».

Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary

But that also means sending more bytes through the wire :S

Well, you can (and should) drop the 'X-' :-)

See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix and

Similar Constructs in Application Protocols

...
-- Ori Livneh

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asher Feldman

10:55 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing this in request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal.

On Sunday, February 3, 2013, David Schoonover wrote:

...

Huh! News to me as well. I definitely agree with that decision. Thanks, Ori!

I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode?

Looking especially to hear from Arthur and Matt.

-- David Schoonover dsc@wikimedia.org javascript:;

On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere <dvanliere@wikimedia.org javascript:;>wrote:

...
Thanks Ori, I was not aware of this D

Sent from my iPhone

On 2013-02-02, at 16:55, Ori Livneh <ori@wikimedia.org javascript:;>

wrote:

...
...
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:

...
I don't like it's cryptic nature.

Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».

Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary

But that also means sending more bytes through the wire :S

Well, you can (and should) drop the 'X-' :-)

See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix

and

...
Similar Constructs in Application Protocols

...
-- Ori Livneh

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asher Feldman

11:35 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

Regarding varnish cacheability of mobile API requests with a logging query param - it would probably be worth making frontend varnishes strip out all occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name that can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.

&l=mft2&l=mfstable etc.

So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across unrelated systems like text squids or image caches.

On Sunday, February 3, 2013, Asher Feldman wrote:

...

If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing this in request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal.

On Sunday, February 3, 2013, David Schoonover wrote:

...
Huh! News to me as well. I definitely agree with that decision. Thanks, Ori!

I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode?

Looking especially to hear from Arthur and Matt.

-- David Schoonover dsc@wikimedia.org

On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:

...
Thanks Ori, I was not aware of this D

Sent from my iPhone

On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:

...
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:

...
I don't like it's cryptic nature.

Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».

Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary

But that also means sending more bytes through the wire :S

Well, you can (and should) drop the 'X-' :-)

See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix

and

...
Similar Constructs in Application Protocols

...
-- Ori Livneh

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Tyler Romeo

11:42 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

Considering that the query component of a URI is meant to identify the resource whereas HTTP headers are meant to tell the server additional information about the request, I think a header approach is much more appropriate than a no-op query parameter.

If the X- is removed, I'd have no problem with the addition of these headers, but what is the advantage of having two over one. Wouldn't a header like: MobileFrontend: 1/2 a/b/s work just as fine?

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman afeldman@wikimedia.orgwrote:

...

Regarding varnish cacheability of mobile API requests with a logging query param - it would probably be worth making frontend varnishes strip out all occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name that can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.

&l=mft2&l=mfstable etc.

So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across unrelated systems like text squids or image caches.

On Sunday, February 3, 2013, Asher Feldman wrote:

...
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing this

in

...
request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal.

On Sunday, February 3, 2013, David Schoonover wrote:

...
Huh! News to me as well. I definitely agree with that decision. Thanks, Ori!

I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode?

Looking especially to hear from Arthur and Matt.

-- David Schoonover dsc@wikimedia.org

On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:

...
Thanks Ori, I was not aware of this D

Sent from my iPhone

On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:

...
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:

...
I don't like it's cryptic nature.

Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».

Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary

But that also means sending more bytes through the wire :S

Well, you can (and should) drop the 'X-' :-)

See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix

and

...
Similar Constructs in Application Protocols

...
-- Ori Livneh

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asher Feldman

12:12 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

That's not at all true in the real world. Look at the actual requests for google analytics on a high percentage of sites, etc.

Setting new request headers for mobile that map to new inflexible fields in the log stream that must be set on all non mobile requests ("\t-\t-") equals gigabytes of unnecessarily log data every day (that we want to save 100% of) for no good reason. Wanting to keep query params "pure" isn't a good reason.

On Sunday, February 3, 2013, Tyler Romeo wrote:

...

Considering that the query component of a URI is meant to identify the resource whereas HTTP headers are meant to tell the server additional information about the request, I think a header approach is much more appropriate than a no-op query parameter.

If the X- is removed, I'd have no problem with the addition of these headers, but what is the advantage of having two over one. Wouldn't a header like: MobileFrontend: 1/2 a/b/s work just as fine?

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:;

On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman <afeldman@wikimedia.orgjavascript:;

...
wrote:

...
Regarding varnish cacheability of mobile API requests with a logging

query

...
param - it would probably be worth making frontend varnishes strip out

all

...
occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name

that

...
can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.

&l=mft2&l=mfstable etc.

So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across

unrelated

...
systems like text squids or image caches.

On Sunday, February 3, 2013, Asher Feldman wrote:

...
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing

this

...
in

...
request headers and altering edge config is unnecessary and a bad

design

...
...
pattern. On the analytics side, if parsing query params seems

challenging

...
...
vs. having a fixed field to parse, deal.

On Sunday, February 3, 2013, David Schoonover wrote:

...
Huh! News to me as well. I definitely agree with that decision.

Thanks,

...
...
...
Ori!

I've already written the Varnish code for setting X-MF-Mode so it can

be

...
...
...
captured by varnishncsa. Is there agreement to switch to Mobile-Mode,

or

...
...
...
at least, MF-Mode?

Looking especially to hear from Arthur and Matt.

-- David Schoonover dsc@wikimedia.org

On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:

...
Thanks Ori, I was not aware of this D

Sent from my iPhone

On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:

...
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:

> I don't like it's cryptic nature. > > Someone looking at the headers sent to his browser would be very > confused about what's the point of «X-MF-Mode: b». > > Instead something like this would be much more descriptive: > X-Mobile-Mode: stable > X-Mobile-Request: secondary > > But that also means sending more bytes through the wire :S Well, you can (and should) drop the 'X-' :-)

See http://tools.ietf.org/html/rfc6648: Deprecating the "X-"

Prefix

...
...
...
and

...
Similar Constructs in Application Protocols

...
-- Ori Livneh

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Tyler Romeo

9:17 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

Remind me again why a production setup is logging every header of every request? Also, if you are logging every header, then the amount of data added by a single extra header would be insignificant compared to the rest of the request.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com

On Sun, Feb 3, 2013 at 5:12 AM, Asher Feldman afeldman@wikimedia.orgwrote:

...

That's not at all true in the real world. Look at the actual requests for google analytics on a high percentage of sites, etc.

Setting new request headers for mobile that map to new inflexible fields in the log stream that must be set on all non mobile requests ("\t-\t-") equals gigabytes of unnecessarily log data every day (that we want to save 100% of) for no good reason. Wanting to keep query params "pure" isn't a good reason.

On Sunday, February 3, 2013, Tyler Romeo wrote:

...
Considering that the query component of a URI is meant to identify the resource whereas HTTP headers are meant to tell the server additional information about the request, I think a header approach is much more appropriate than a no-op query parameter.

If the X- is removed, I'd have no problem with the addition of these headers, but what is the advantage of having two over one. Wouldn't a header like: MobileFrontend: 1/2 a/b/s work just as fine?

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:;

On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman <afeldman@wikimedia.org

javascript:;

...
...
wrote:

...
Regarding varnish cacheability of mobile API requests with a logging

query

...
param - it would probably be worth making frontend varnishes strip out

all

...
occurrences of that query param and its value from their backend

requests

...
...
so they're all the same to the caching instances. A generic param name

that

...
can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.

&l=mft2&l=mfstable etc.

So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across

unrelated

...
systems like text squids or image caches.

On Sunday, February 3, 2013, Asher Feldman wrote:

...
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing

this

...
in

...
request headers and altering edge config is unnecessary and a bad

design

...
...
pattern. On the analytics side, if parsing query params seems

challenging

...
...
vs. having a fixed field to parse, deal.

On Sunday, February 3, 2013, David Schoonover wrote:

...
Huh! News to me as well. I definitely agree with that decision.

Thanks,

...
...
...
Ori!

I've already written the Varnish code for setting X-MF-Mode so it

can

...
be

...
...
...
captured by varnishncsa. Is there agreement to switch to

Mobile-Mode,

...
or

...
...
...
at least, MF-Mode?

Looking especially to hear from Arthur and Matt.

-- David Schoonover dsc@wikimedia.org

On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:

...
Thanks Ori, I was not aware of this D

Sent from my iPhone

On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:

> > > On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote: > >> I don't like it's cryptic nature. >> >> Someone looking at the headers sent to his browser would be

very

...
...
...
...
...
>> confused about what's the point of «X-MF-Mode: b». >> >> Instead something like this would be much more descriptive: >> X-Mobile-Mode: stable >> X-Mobile-Request: secondary >> >> But that also means sending more bytes through the wire :S > Well, you can (and should) drop the 'X-' :-) > > See http://tools.ietf.org/html/rfc6648: Deprecating the "X-"

Prefix

...
...
...
and

...
Similar Constructs in Application Protocols > > > -- > Ori Livneh > > > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asher Feldman

11:57 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Sunday, February 3, 2013, Tyler Romeo wrote:

...

Remind me again why a production setup is logging every header of every request?

That's ludicrous. Please reread our udplog format documentation and this entire thread carefully, especially the first message before commenting any further.

...

Also, if you are logging every header, then the amount of data added by a single extra header would be insignificant compared to the rest of the request.

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:;

On Sun, Feb 3, 2013 at 5:12 AM, Asher Feldman <afeldman@wikimedia.orgjavascript:;

...
wrote:

...
That's not at all true in the real world. Look at the actual requests for google analytics on a high percentage of sites, etc.

Setting new request headers for mobile that map to new inflexible fields

in

...
the log stream that must be set on all non mobile requests ("\t-\t-") equals gigabytes of unnecessarily log data every day (that we want to save 100% of) for no good reason. Wanting to keep query params "pure" isn't a good reason.

On Sunday, February 3, 2013, Tyler Romeo wrote:

...
Considering that the query component of a URI is meant to identify the resource whereas HTTP headers are meant to tell the server additional information about the request, I think a header approach is much more appropriate than a no-op query parameter.

If the X- is removed, I'd have no problem with the addition of these headers, but what is the advantage of having two over one. Wouldn't a header like: MobileFrontend: 1/2 a/b/s work just as fine?

*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:;javascript:;

On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman <afeldman@wikimedia.orgjavascript:;

javascript:;

...
...
wrote:

...
Regarding varnish cacheability of mobile API requests with a logging

query

...
param - it would probably be worth making frontend varnishes strip

out

...
...
all

...
occurrences of that query param and its value from their backend

requests

...
...
so they're all the same to the caching instances. A generic param

name

...
...
that

...
can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.

&l=mft2&l=mfstable etc.

So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across

unrelated

...
systems like text squids or image caches.

On Sunday, February 3, 2013, Asher Feldman wrote:

...
If you want to differentiate categories of API requests in logs,

add

...
...
...
...
descriptive noop query params to the requests. I.e &mfmode=2. Doing

this

...
in

...
request headers and altering edge config is unnecessary and a bad

design

...
...
pattern. On the analytics side, if parsing query params seems

challenging

...
...
vs. having a fixed field to parse, deal.

On Sunday, February 3, 2013, David Schoonover wrote:

...
Huh! News to me as well. I definitely agree with that decision.

Thanks,

...
...
...
Ori!

I've already written the Varnish code for setting X-MF-Mode so it

can

...
be

...
...
...
captured by varnishncsa. Is there agreement to switch to

Mobile-Mode,

...
or

...
...
...
at least, MF-Mode?

Looking especially to hear from Arthur and Matt.

-- David Schoonover dsc@wikimedia.org

On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:

> Thanks Ori, I was not aware of this > D > > Sent from my iPhone > > On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote: > > > > > > > On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote: > > > >> I don't like it's cryptic nature. > >> > >> Someone looking at the headers sent to his browser would be

very

...
...
...
...
> >> confused about what's the point of «X-MF-Mode: b». > >> > >> Instead something like this would be much more descriptive: > >> X-Mobile-Mode: stable > >> X-Mobile-Request: secondary > >> > >> But that also means sending more bytes through the wire :S > > Well, you can (and should) drop the 'X-' :-) > > > > See http://tools.ietf.org/html/rfc6648: Deprecating the "X-"

Prefix

...
...
...
and > Similar Constructs in Application Protocols > > > > > > -- > > Ori Livneh > > > >

Arthur Richards

5 Feb 5 Feb

2:24 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Sun, Feb 3, 2013 at 2:35 AM, Asher Feldman afeldman@wikimedia.orgwrote:

...

Regarding varnish cacheability of mobile API requests with a logging query param - it would probably be worth making frontend varnishes strip out all occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name that can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.

&l=mft2&l=mfstable etc.

So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across unrelated systems like text squids or image caches.

On Sunday, February 3, 2013, Asher Feldman wrote:

...
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing this

in

...
request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal.

Asher, I understand your hesitation about using HTTP header fields, but there are a couple problems I'm seeing with using query string parameters. Perhaps you or others have some ideas how to get around these: * We should keep user-facing URLs canonical as much as possible (primarily for link sharing) ** If we keep user-facing URLs canonical, we could potentially add query string params via javascript, but that would only work on devices that support javascript/have javascript enabled (this might not be a huge deal as we are planning changes such that users that do not support jQuery will get a simplified version of the stable site) * How could this work for the first pageview request (eg a user clicking a link from Google or even just browsing to http://en.wikipedia.org)?

I may be missing other potential problems - it would be great if others from the mobile team could chime in.

-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687

Brion Vibber

2:30 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards arichards@wikimedia.orgwrote:

...

Asher, I understand your hesitation about using HTTP header fields, but there are a couple problems I'm seeing with using query string parameters. Perhaps you or others have some ideas how to get around these:

We should keep user-facing URLs canonical as much as possible (primarily

for link sharing) ** If we keep user-facing URLs canonical, we could potentially add query string params via javascript, but that would only work on devices that support javascript/have javascript enabled (this might not be a huge deal as we are planning changes such that users that do not support jQuery will get a simplified version of the stable site)

* How could this work for the first pageview request (eg a user clicking a

...

link from Google or even just browsing to http://en.wikipedia.org)?

I think mainly we need the tracking on the API requests... that's all JavaScript-initiated, and all hidden from the user. The main problem with adding parameters would be for caching .... but none of the API hits are currently cacheable so that's not an immediate issue perhaps.

-- brion

Arthur Richards

2:38 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Mon, Feb 4, 2013 at 5:30 PM, Brion Vibber brion@pobox.com wrote:

...

On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards <arichards@wikimedia.org

...
wrote:

...
Asher, I understand your hesitation about using HTTP header fields, but there are a couple problems I'm seeing with using query string

parameters.

...
Perhaps you or others have some ideas how to get around these:

We should keep user-facing URLs canonical as much as possible

(primarily

...
for link sharing) ** If we keep user-facing URLs canonical, we could potentially add query string params via javascript, but that would only work on devices that support javascript/have javascript enabled (this might not be a huge deal as we are planning changes such that users that do not support jQuery

will

...
get a simplified version of the stable site)

How could this work for the first pageview request (eg a user clicking a

...
link from Google or even just browsing to http://en.wikipedia.org)?

I think mainly we need the tracking on the API requests... that's all JavaScript-initiated, and all hidden from the user. The main problem with adding parameters would be for caching .... but none of the API hits are currently cacheable so that's not an immediate issue perhaps.

We also need to be able to differentiate between alpha/beta/stable versions of the mobile site, without having to parse the cookie header (I believe as a result of performance constraints around this? I think the analytics team had looked into this previously).

-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687

Brion Vibber

2:49 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Mon, Feb 4, 2013 at 4:38 PM, Arthur Richards arichards@wikimedia.orgwrote:

...

On Mon, Feb 4, 2013 at 5:30 PM, Brion Vibber brion@pobox.com wrote:

...
On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards <arichards@wikimedia.org

...
wrote:

How could this work for the first pageview request (eg a user clicking

a

...
...
link from Google or even just browsing to http://en.wikipedia.org)?

I think mainly we need the tracking on the API requests... that's all JavaScript-initiated, and all hidden from the user. The main problem with adding parameters would be for caching .... but none of the API hits are currently cacheable so that's not an immediate issue perhaps.

We also need to be able to differentiate between alpha/beta/stable versions of the mobile site, without having to parse the cookie header (I believe as a result of performance constraints around this? I think the analytics team had looked into this previously).

Yeah that's.... probably not possible if you want to track that for initial page views. Cookie's the only thing guaranteed to have the data available, and we have no way to inject a header into mobile web browsers except for the XHR hits to the API.

-- brion

Arthur Richards

2:59 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Mon, Feb 4, 2013 at 5:49 PM, Brion Vibber brion@pobox.com wrote:

...

On Mon, Feb 4, 2013 at 4:38 PM, Arthur Richards <arichards@wikimedia.org

...
wrote:

...
On Mon, Feb 4, 2013 at 5:30 PM, Brion Vibber brion@pobox.com wrote:

...
On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards <

arichards@wikimedia.org

...
...
...
wrote:

How could this work for the first pageview request (eg a user

clicking

...
a

...
...
link from Google or even just browsing to http://en.wikipedia.org)?

I think mainly we need the tracking on the API requests... that's all JavaScript-initiated, and all hidden from the user. The main problem

with

...
...
adding parameters would be for caching .... but none of the API hits

are

...
...
currently cacheable so that's not an immediate issue perhaps.

We also need to be able to differentiate between alpha/beta/stable

versions

...
of the mobile site, without having to parse the cookie header (I believe

as

...
a result of performance constraints around this? I think the analytics

team

...
had looked into this previously).

Yeah that's.... probably not possible if you want to track that for initial page views. Cookie's the only thing guaranteed to have the data available, and we have no way to inject a header into mobile web browsers except for the XHR hits to the API.

-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

In the case of the cookie, the header would actually get set by the backend response (from Apache) and I believe Dave cooked up or was planning on cooking some magic to somehow make that information discernable when results are cached.

-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687

Asher Feldman

3:21 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards arichards@wikimedia.orgwrote:

...

In the case of the cookie, the header would actually get set by the backend response (from Apache) and I believe Dave cooked up or was planning on cooking some magic to somehow make that information discernable when results are cached.

Opting into the mobile beta as it is currently implemented bypasses varnish caching for all future mobile pageviews for the life of the cookie. So this probably isn't quite right (at least the "when results are cached" part.)

Asher Feldman

4:12 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Mon, Feb 4, 2013 at 5:21 PM, Asher Feldman afeldman@wikimedia.orgwrote:

...

On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards arichards@wikimedia.orgwrote:

...
In the case of the cookie, the header would actually get set by the backend response (from Apache) and I believe Dave cooked up or was planning on cooking some magic to somehow make that information discernable when results are cached.

Opting into the mobile beta as it is currently implemented bypasses varnish caching for all future mobile pageviews for the life of the cookie. So this probably isn't quite right (at least the "when results are cached" part.)

Thinking about this further.. So long as all beta optins bypass all caching and always have to hit an apache, it would be fine for mf to set a response header reflecting the version of the site the optin cookie triggers (but only if there's an optin, avoid setting on standard.) I'd just prefer this to be logged without adding a field to the entire udplog stream that will generally just be wasted space. Mobile already has one dedicated udplog field currently intended for zero carriers, wasted log space for nearly every request. Make it a key/value field that can contain multiple keys, i.e. "zc:orn;v:b1" (zero carrier = orange whatever, version = beta1)

If by some chance mobile beta gets implemented in a way that doesn't kill frontend caching for its users (maybe solely via different js behavior based on the presence of the optin cookie?) the above won't be applicable anymore, so using the event log facility / pixel service to note beta usage becomes more appropriate. If beta usage is going to be driven upwards, I hope this approach is seriously considered. Mobile currently only has around a 58% edge cache hitrate as it is and it sounds like upcoming features will place significant new demands on the apaches and for memcached space. If a non cache busting beta site is doable, go for the logging method now that will later be compatible with it to avoid having to change processing methods.

Arthur Richards

6 Feb 6 Feb

12:13 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Mon, Feb 4, 2013 at 7:12 PM, Asher Feldman afeldman@wikimedia.orgwrote:

...

On Mon, Feb 4, 2013 at 5:21 PM, Asher Feldman <afeldman@wikimedia.org

...
wrote:

...
On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards <arichards@wikimedia.org wrote:

...
In the case of the cookie, the header would actually get set by the backend response (from Apache) and I believe Dave cooked up or was planning on cooking some magic to somehow make that information discernable when results are cached.

Opting into the mobile beta as it is currently implemented bypasses varnish caching for all future mobile pageviews for the life of the cookie. So this probably isn't quite right (at least the "when results

are

...
cached" part.)

Thinking about this further.. So long as all beta optins bypass all caching and always have to hit an apache, it would be fine for mf to set a response header reflecting the version of the site the optin cookie triggers (but only if there's an optin, avoid setting on standard.) I'd just prefer this to be logged without adding a field to the entire udplog stream that will generally just be wasted space. Mobile already has one dedicated udplog field currently intended for zero carriers, wasted log space for nearly every request. Make it a key/value field that can contain multiple keys, i.e. "zc:orn;v:b1" (zero carrier = orange whatever, version = beta1)

If by some chance mobile beta gets implemented in a way that doesn't kill frontend caching for its users (maybe solely via different js behavior based on the presence of the optin cookie?) the above won't be applicable anymore, so using the event log facility / pixel service to note beta usage becomes more appropriate. If beta usage is going to be driven upwards, I hope this approach is seriously considered. Mobile currently only has around a 58% edge cache hitrate as it is and it sounds like upcoming features will place significant new demands on the apaches and for memcached space. If a non cache busting beta site is doable, go for the logging method now that will later be compatible with it to avoid having to change processing methods. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

OK - this is all making a lot more sense to me now, thanks for your clarifications and suggestions, Asher.

So, from the mobile team's perspective a straightforward implementation to get us to our goal might be to: 1) add a query parameter to identify 'secondary' API hits (eg an API request for page content made after an initial request for that page was made, all other requests stay the same) 2) use the header solution to identify beta/alpha cookies (HTTP header set by backend response when user is opted in).

One thing I'd like to double check though is that 'Opting into the mobile beta as it is currently implemented bypasses varnish caching for all future mobile pageviews for the life of the cookie' - I thought the Varnish cache was just varied by the optin cookies, not totally bypassed. I've looked at headers from some sample requests I've made with the beta opt-in and I'm not seeing any cache hits, so I gather you are correct. Can you please confirm this?

Analytics folks, is this workable from your perspective?

-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687

Diederik van Liere

12:36 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

...

Analytics folks, is this workable from your perspective?

Yes, this works fine for us and it's also no problem to set multiple

key/value pairs in the http header that we are now using for the X-CS header. Diederik

David Schoonover

10:32 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit.

*1. X-MF-Mode: Alpha/Beta Site Usage* * * We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish. This will avoid an explosion of cryptic headers for analytic purposes.

Questions: - It seems there's some confusion around "bypassing Varnish". If I understand correctly, it's not that Varnish is ever bypassed, just that the upstream response is not cached if cookies are present. Is that right? - Since we're repurposing X-CS, should we perhaps rename it to something more apt to address concerns about cryptic non-standard headers flying about?

*2. X-MF-Req: Primary vs Secondary API Requests*

This header will be replaced with a query parameter set by the client-side JS code making the request. Analytics will parse it out at processing time and Do The Right Thing.

Kindly correct me if I've gotten anything wrong.

-- David Schoonover dsc@wikimedia.org

On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:

...

...
Analytics folks, is this workable from your perspective?

Yes, this works fine for us and it's also no problem to set multiple

key/value pairs in the http header that we are now using for the X-CS header. Diederik _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asher Feldman

10:59 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Wednesday, February 6, 2013, David Schoonover wrote:

...

Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit.

*1. X-MF-Mode: Alpha/Beta Site Usage*

We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish.

Nope. There will be a header denoting non-standard MobileFrontend views if the mobile team wants to leave the caching situation as is. It will be a response header set by mediawiki, not varnish. The header will have a unique name, it will not share the name of the zero carrier header. The udplog field that currently only ever contains carrier information on zero requests will become a key value field. Udplog fields are not named, they are positional.

...

This will avoid an explosion of cryptic headers for analytic purposes.

Questions:

It seems there's some confusion around "bypassing Varnish". If I

understand correctly, it's not that Varnish is ever bypassed, just that the upstream response is not cached if cookies are present. Is that right?

"Bypasses varnish caching" != "bypassing varnish." I don't see any use of the later in this thread, but if there has been confusion, know that all m.wikipedia.org requests are served via varnish.

...

Since we're repurposing X-CS, should we perhaps rename it to something

more apt to address concerns about cryptic non-standard headers flying about?

Nope.. We're repurposing the fixed position udplog field, not the zero carrier code header.

...

*2. X-MF-Req: Primary vs Secondary API Requests*

This header will be replaced with a query parameter set by the client-side JS code making the request. Analytics will parse it out at processing time and Do The Right Thing.

Kindly correct me if I've gotten anything wrong.

-- David Schoonover dsc@wikimedia.org

On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:

...
...
Analytics folks, is this workable from your perspective?

Yes, this works fine for us and it's also no problem to set multiple

key/value pairs in the http header that we are now using for the X-CS header. Diederik _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

David Schoonover

11:01 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

That all sounds fine to me so long as we're all agreed.

-- David Schoonover dsc@wikimedia.org

On Wed, Feb 6, 2013 at 12:59 PM, Asher Feldman afeldman@wikimedia.orgwrote:

...

On Wednesday, February 6, 2013, David Schoonover wrote:

...
Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit.

*1. X-MF-Mode: Alpha/Beta Site Usage*

We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish.

Nope. There will be a header denoting non-standard MobileFrontend views if the mobile team wants to leave the caching situation as is. It will be a response header set by mediawiki, not varnish. The header will have a unique name, it will not share the name of the zero carrier header. The udplog field that currently only ever contains carrier information on zero requests will become a key value field. Udplog fields are not named, they are positional.

...
This will avoid an explosion of cryptic headers for analytic purposes.

Questions:

It seems there's some confusion around "bypassing Varnish". If I

understand correctly, it's not that Varnish is ever bypassed, just that

the

...
upstream response is not cached if cookies are present. Is that right?

"Bypasses varnish caching" != "bypassing varnish." I don't see any use of the later in this thread, but if there has been confusion, know that all m.wikipedia.org requests are served via varnish.

...

Since we're repurposing X-CS, should we perhaps rename it to something

more apt to address concerns about cryptic non-standard headers flying about?

Nope.. We're repurposing the fixed position udplog field, not the zero carrier code header.

...
*2. X-MF-Req: Primary vs Secondary API Requests*

This header will be replaced with a query parameter set by the

client-side

...
JS code making the request. Analytics will parse it out at processing

time

...
and Do The Right Thing.

Kindly correct me if I've gotten anything wrong.

-- David Schoonover dsc@wikimedia.org

On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:

...
...
Analytics folks, is this workable from your perspective?

Yes, this works fine for us and it's also no problem to set multiple

key/value pairs in the http header that we are now using for the X-CS header. Diederik _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asher Feldman

11:05 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Wednesday, February 6, 2013, David Schoonover wrote:

...

That all sounds fine to me so long as we're all agreed.

Lol. RFC closed.

...

-- David Schoonover dsc@wikimedia.org javascript:;

On Wed, Feb 6, 2013 at 12:59 PM, Asher Feldman <afeldman@wikimedia.orgjavascript:;

...
wrote:

...
On Wednesday, February 6, 2013, David Schoonover wrote:

...
Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit.

*1. X-MF-Mode: Alpha/Beta Site Usage*

We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish.

Nope. There will be a header denoting non-standard MobileFrontend views

if

...
the mobile team wants to leave the caching situation as is. It will be a response header set by mediawiki, not varnish. The header will have a unique name, it will not share the name of the zero carrier header. The udplog field that currently only ever contains carrier information on

zero

...
requests will become a key value field. Udplog fields are not named, they are positional.

...
This will avoid an explosion of cryptic headers for analytic purposes.

Questions:

It seems there's some confusion around "bypassing Varnish". If I

understand correctly, it's not that Varnish is ever bypassed, just that

the

...
upstream response is not cached if cookies are present. Is that right?

"Bypasses varnish caching" != "bypassing varnish." I don't see any use

of

...
the later in this thread, but if there has been confusion, know that all m.wikipedia.org requests are served via varnish.

...

Since we're repurposing X-CS, should we perhaps rename it to

something

...
...
more apt to address concerns about cryptic non-standard headers flying about?

Nope.. We're repurposing the fixed position udplog field, not the zero carrier code header.

...
*2. X-MF-Req: Primary vs Secondary API Requests*

This header will be replaced with a query parameter set by the

client-side

...
JS code making the request. Analytics will parse it out at processing

time

...
and Do The Right Thing.

Kindly correct me if I've gotten anything wrong.

-- David Schoonover dsc@wikimedia.org javascript:;

On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere <dvanliere@wikimedia.org javascript:;>wrote:

...
...
Analytics folks, is this workable from your perspective?

Yes, this works fine for us and it's also no problem to set

multiple

...
...
...
key/value pairs in the http header that we are now using for the X-CS header. Diederik _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Mark Bergsma

7 Feb 7 Feb

2:32 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Feb 6, 2013, at 9:32 PM, David Schoonover dsc@wikimedia.org wrote:

...

Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit.

*1. X-MF-Mode: Alpha/Beta Site Usage*

We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish. This will avoid an explosion of cryptic headers for analytic purposes.

Questions:

It seems there's some confusion around "bypassing Varnish". If I

understand correctly, it's not that Varnish is ever bypassed, just that the upstream response is not cached if cookies are present. Is that right?

Yes

...

Since we're repurposing X-CS, should we perhaps rename it to something

more apt to address concerns about cryptic non-standard headers flying about?

I'd like to propose to define *one* request header to be used for all analytics purposes. It can be key/value pairs, and be set client side where applicable. Varnish can append to it where needed, later keys overriding earlier ones. Then we can log that one header across all HTTP/caching clusters without having to change the log stream all the time, and without wasting much space, and caching edge configuration changes are kept to a minimum as well.

And we might as well be transparent in its naming. header name "Log-Parameters:"?

...

*2. X-MF-Req: Primary vs Secondary API Requests*

This header will be replaced with a query parameter set by the client-side JS code making the request. Analytics will parse it out at processing time and Do The Right Thing.

I think the question of using a URL param vs a request header should mainly take into account whether the response varies on the value of the parameter. If the responses are otherwise identical, and the value is only used for analytics purposes, I would prefer to put that into the above header instead, as it will impair cacheability / cache size otherwise (even if those requests are currently not cacheable for other reasons). If the responses are actually different based on this parameter, I would prefer to have it in the URL where possible.

-- Mark Bergsma mark@wikimedia.org Lead Operations Architect Wikimedia Foundation

David Schoonover

3:21 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

...

I'd like to propose to define *one* request header to be used for all analytics purposes. It can be key/value pairs, and be set client side where applicable. Varnish can append to it where needed, later keys overriding earlier ones. Then we can log that one header across all HTTP/caching clusters without having to change the log stream all the time, and without wasting much space, and caching edge configuration changes are kept to a minimum as well.

Agreed. Instrumentation should ideally never get in the way of production performance, so if we can cut or optimize header use for logging without being too onerous, we'll happily do so. afaik, the reasons that custom HTTP headers are used at all are: - They're accessible from varnishncsa without code modifications; - Varnish and/or other parties in the request chain can munge the values prior to logging to save bytes (examples being X-CS, which replaces the semantic carrier name with a [vastly shorter] numeric code, and the proposed X-MF-Mode header, which prevents the need to log the whole cookies header for post-processing).

Ideally, none of this should need to make a trip to the client. I don't recall seeing anything in the Varnish docs providing a way to send values exclusively to the loggers, but if there is, that's an easy win, and it wouldn't require any changes to our parsing pipeline.

If that's not possible, it makes sense to collapse various headers into a KV field; that would require changes on our side, including all downstream consumers of the log stream (which is surprisingly large), so it's not a trivial move.

-- David Schoonover dsc@wikimedia.org

Asher Feldman

10 Feb 10 Feb

12:21 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Thu, Feb 7, 2013 at 4:32 AM, Mark Bergsma mark@wikimedia.org wrote:

...

...

Since we're repurposing X-CS, should we perhaps rename it to something

more apt to address concerns about cryptic non-standard headers flying about?

I'd like to propose to define *one* request header to be used for all analytics purposes. It can be key/value pairs, and be set client side where applicable.

There's been some confusion in this thread between headers used by mediawiki in determining content generation or for cache variance, and those intended only for logging. The zero carrier header is used by the zero extension to return specific content banners and set different default behaviors (i.e. hide all images) as negotiated with individual mobile carriers. A reader familiar with this might note that their are separate X-CS and X-Carrier headers but X-Carrier is supposed to go away now.

Agreed that there should be a single header for content that's strictly for analytics purposes. All changes to the udplog format in the last year or so could likely be reverted except for the delimiter change, with a multipurpose analytics key/value field added for all else.

...

I think the question of using a URL param vs a request header should mainly take into account whether the response varies on the value of the parameter. If the responses are otherwise identical, and the value is only used for analytics purposes, I would prefer to put that into the above header instead, as it will impair cacheability / cache size otherwise (even if those requests are currently not cacheable for other reasons). If the responses are actually different based on this parameter, I would prefer to have it in the URL where possible.

For this particular case, the API requests are for either getting specific sections of an article as opposed to either the whole thing, or the first section as part of an initial pageview. I might not have grokked the original RFC email well, but I don't understand why this was being discussed as a logging challenge or necessitating a request header. A mobile api request to just get section 3 of the article on otters should already utilize a query param denoting that section 3 is being fetched, and is already clearly not a "primary" request.

Whether or not it makes sense for mobile to move in the direction of splitting up article views into many api requests is something I'd love to see backed up by data. I'm skeptical for multiple reasons.

Mark Bergsma

11 Feb 11 Feb

4:28 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Feb 9, 2013, at 11:21 PM, Asher Feldman afeldman@wikimedia.org wrote:

...

For this particular case, the API requests are for either getting specific sections of an article as opposed to either the whole thing, or the first section as part of an initial pageview. I might not have grokked the original RFC email well, but I don't understand why this was being discussed as a logging challenge or necessitating a request header. A mobile api request to just get section 3 of the article on otters should already utilize a query param denoting that section 3 is being fetched, and is already clearly not a "primary" request.

Yes, that part remains a bit unclear to me as well - some more details would be welcome.

...

Whether or not it makes sense for mobile to move in the direction of splitting up article views into many api requests is something I'd love to see backed up by data. I'm skeptical for multiple reasons.

What is the main motivation used here? Reducing article sizes/transfers at the expense of more latency?

-- Mark Bergsma mark@wikimedia.org Lead Operations Architect Wikimedia Foundation

Asher Feldman

8:11 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Monday, February 11, 2013, Mark Bergsma wrote:

...

On Feb 9, 2013, at 11:21 PM, Asher Feldman <afeldman@wikimedia.orgjavascript:;> wrote:

...
Whether or not it makes sense for mobile to move in the direction of splitting up article views into many api requests is something I'd love

to

...
see backed up by data. I'm skeptical for multiple reasons.

What is the main motivation used here? Reducing article sizes/transfers at the expense of more latency?

In cases where most sections (probably not even all) are loaded, I'd expect it to increase the amount of data transfered beyond just the overhead of the additional requests. gzip might take a 30k article down to 4k but will be less efficient on individual sections. Text compresses really well, and roundtrip latency is high on many cell networks.

And then I'd wonder about the server side implementation. How will frontend cache invalidation work? Are we going to need to purge every individual article section relative to /w/api.php on edit? Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space needs for article text as 3x the current parser cache utilization? More memcached usage is great, not asking to dissuade its use but because its better to capacity plan than to react.

Max Semenik

10:21 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On 11.02.2013, 22:11 Asher wrote:

...

And then I'd wonder about the server side implementation. How will frontend cache invalidation work? Are we going to need to purge every individual article section relative to /w/api.php on edit?

Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs.

...

Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space needs for article text as 3x the current parser cache utilization? More memcached usage is great, not asking to dissuade its use but because its better to capacity plan than to react.

action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage.

-- Best regards, Max Semenik ([[User:MaxSem]])

Asher Feldman

10:50 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

Max - good answers re: caching concerns. That leaves studying if the bytes transferred on average mobile article view increases or decreases with lazy section loading. If it increases, I'd say this isn't a positive direction to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview, and the impact on backend apache utilization which I'd expect to be > 0.

Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis.

On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.wiki@gmail.com wrote:

...

On 11.02.2013, 22:11 Asher wrote:

...
And then I'd wonder about the server side implementation. How will

frontend

...
cache invalidation work? Are we going to need to purge every individual article section relative to /w/api.php on edit?

Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs.

...
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space needs

for

...
article text as 3x the current parser cache utilization? More memcached usage is great, not asking to dissuade its use but because its better to capacity plan than to react.

action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage.

-- Best regards, Max Semenik ([[User:MaxSem]])

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Jon Robson

12 Feb 12 Feb

3:42 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one).

Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date.

In summary the reason is to 1) make the app feel more responsive by simply loading content rather than reloading the entire interface 2) reducing the payload sent to a device.

Session Tracking ################

Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views.

As for the situations where an entire page is loaded via the api it makes no difference to us to whether we 1) send the same header (set via javascript) or 2) add a query string parameter.

The only advantage I can see of using a header is that an initial page load of the article San Francisco currently uses the same api url as a page load of the article San Francisco via javascript (e.g. I click a link to 'San Francisco' on the California article).

In this new method they would use different urls (as the data sent is different). I'm not sure how that would effect caching.

Let us know which method is preferred. From my perspective implementation of either is easy.

[1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections

On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeldman@wikimedia.org wrote:

...

Max - good answers re: caching concerns. That leaves studying if the bytes transferred on average mobile article view increases or decreases with lazy section loading. If it increases, I'd say this isn't a positive direction to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview, and the impact on backend apache utilization which I'd expect to be > 0.

Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis.

On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.wiki@gmail.com wrote:

...
On 11.02.2013, 22:11 Asher wrote:

...
And then I'd wonder about the server side implementation. How will

frontend

...
cache invalidation work? Are we going to need to purge every individual article section relative to /w/api.php on edit?

Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs.

...
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space needs

for

...
article text as 3x the current parser cache utilization? More memcached usage is great, not asking to dissuade its use but because its better to capacity plan than to react.

action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage.

-- Best regards, Max Semenik ([[User:MaxSem]])

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Jon Robson http://jonrobson.me.uk @rakugojon

Arthur Richards

5:11 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are two cases in which article content gets loaded by the API:

1) Going directly to a page (eg clicking a link from a Google search) will result in the backend serving a page with ONLY summary section content and section headers. The rest of the page is lazily loaded via API request once the JS for the page gets loaded. The idea is to increase responsiveness by reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like: http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

2) Loading an article entirely via Javascript - like when a link is clicked in an article to another article, or an article is loaded via search. This will make ONE call to the API to load article content. API request looks like: http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these API requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we need a way to differentiate the types of API requests being made when they otherwise share the same URL.

On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com wrote:

...

I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one).

Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date.

In summary the reason is to

make the app feel more responsive by simply loading content rather

than reloading the entire interface 2) reducing the payload sent to a device.

Session Tracking ################

Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views.

As for the situations where an entire page is loaded via the api it makes no difference to us to whether we

send the same header (set via javascript) or

add a query string parameter.

The only advantage I can see of using a header is that an initial page load of the article San Francisco currently uses the same api url as a page load of the article San Francisco via javascript (e.g. I click a link to 'San Francisco' on the California article).

In this new method they would use different urls (as the data sent is different). I'm not sure how that would effect caching.

Let us know which method is preferred. From my perspective implementation of either is easy.

[1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections

On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeldman@wikimedia.org wrote:

...
Max - good answers re: caching concerns. That leaves studying if the

bytes

...
transferred on average mobile article view increases or decreases with

lazy

...
section loading. If it increases, I'd say this isn't a positive

direction

...
to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview, and

the

...
impact on backend apache utilization which I'd expect to be > 0.

Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis.

On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.wiki@gmail.com

wrote:

...
...
On 11.02.2013, 22:11 Asher wrote:

...
And then I'd wonder about the server side implementation. How will

frontend

...
cache invalidation work? Are we going to need to purge every

individual

...
...
...
article section relative to /w/api.php on edit?

Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs.

...
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space

needs

...
...
for

...
article text as 3x the current parser cache utilization? More

memcached

...
...
...
usage is great, not asking to dissuade its use but because its better

to

...
...
...
capacity plan than to react.

action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage.

-- Best regards, Max Semenik ([[User:MaxSem]])

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Jon Robson http://jonrobson.me.uk @rakugojon

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687

Asher Feldman

5:32 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

Thanks for the clarification Arthur, that clears up some misconceptions I had. I saw a demo around the allstaff where individual sections were lazy loaded, so I think I had that in my head.

It does still seem to me that the data to determine secondary api requests should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per case 1 below. Otherwise, it's a pageview as per case 2. Difficult or expensive to reconcile? Not when you're doing distributed log analysis via hadoop.

On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards arichards@wikimedia.orgwrote:

...

Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are two cases in which article content gets loaded by the API:

Going directly to a page (eg clicking a link from a Google search) will

result in the backend serving a page with ONLY summary section content and section headers. The rest of the page is lazily loaded via API request once the JS for the page gets loaded. The idea is to increase responsiveness by reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like:

http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

Loading an article entirely via Javascript - like when a link is clicked

in an article to another article, or an article is loaded via search. This will make ONE call to the API to load article content. API request looks like:

http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these API requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we need a way to differentiate the types of API requests being made when they otherwise share the same URL.

On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com wrote:

...
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one).

Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date.

In summary the reason is to

make the app feel more responsive by simply loading content rather

than reloading the entire interface 2) reducing the payload sent to a device.

Session Tracking ################

Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views.

As for the situations where an entire page is loaded via the api it makes no difference to us to whether we

send the same header (set via javascript) or

add a query string parameter.

The only advantage I can see of using a header is that an initial page load of the article San Francisco currently uses the same api url as a page load of the article San Francisco via javascript (e.g. I click a link to 'San Francisco' on the California article).

In this new method they would use different urls (as the data sent is different). I'm not sure how that would effect caching.

Let us know which method is preferred. From my perspective implementation of either is easy.

[1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections

On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeldman@wikimedia.org wrote:

...
Max - good answers re: caching concerns. That leaves studying if the

bytes

...
transferred on average mobile article view increases or decreases with

lazy

...
section loading. If it increases, I'd say this isn't a positive

direction

...
to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview, and

the

...
impact on backend apache utilization which I'd expect to be > 0.

Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis.

On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.wiki@gmail.com

wrote:

...
...
On 11.02.2013, 22:11 Asher wrote:

...
And then I'd wonder about the server side implementation. How will

frontend

...
cache invalidation work? Are we going to need to purge every

individual

...
...
...
article section relative to /w/api.php on edit?

Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs.

...
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space

needs

...
...
for

...
article text as 3x the current parser cache utilization? More

memcached

...
...
...
usage is great, not asking to dissuade its use but because its

better

...
to

...
...
...
capacity plan than to react.

action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage.

-- Best regards, Max Semenik ([[User:MaxSem]])

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Jon Robson http://jonrobson.me.uk @rakugojon

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Diederik van Liere

7:14 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

...

It does still seem to me that the data to determine secondary api requests should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per case 1 below. Otherwise, it's a pageview as per case 2. Difficult or expensive to reconcile? Not when you're doing distributed log analysis via hadoop.

So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D

...

On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards <arichards@wikimedia.org

...
wrote:

...
Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are

two

...
cases in which article content gets loaded by the API:

Going directly to a page (eg clicking a link from a Google search)

will

...
result in the backend serving a page with ONLY summary section content

and

...
section headers. The rest of the page is lazily loaded via API request

once

...
the JS for the page gets loaded. The idea is to increase responsiveness

by

...
reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like:

http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

...

Loading an article entirely via Javascript - like when a link is

clicked

...
in an article to another article, or an article is loaded via search.

This

...
will make ONE call to the API to load article content. API request looks like:

http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

...
These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these

API

...
requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we

need

...
a way to differentiate the types of API requests being made when they otherwise share the same URL.

On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com wrote:

...
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one).

Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date.

In summary the reason is to

make the app feel more responsive by simply loading content rather

than reloading the entire interface 2) reducing the payload sent to a device.

Session Tracking ################

Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views.

As for the situations where an entire page is loaded via the api it makes no difference to us to whether we

send the same header (set via javascript) or

add a query string parameter.

The only advantage I can see of using a header is that an initial page load of the article San Francisco currently uses the same api url as a page load of the article San Francisco via javascript (e.g. I click a link to 'San Francisco' on the California article).

In this new method they would use different urls (as the data sent is different). I'm not sure how that would effect caching.

Let us know which method is preferred. From my perspective implementation of either is easy.

[1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections

On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman <

afeldman@wikimedia.org>

...
...
wrote:

...
Max - good answers re: caching concerns. That leaves studying if the

bytes

...
transferred on average mobile article view increases or decreases

with

...
...
lazy

...
section loading. If it increases, I'd say this isn't a positive

direction

...
to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview,

and

...
...
the

...
impact on backend apache utilization which I'd expect to be > 0.

Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis.

On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik <maxsem.wiki@gmail.com

...
wrote:

...
...
On 11.02.2013, 22:11 Asher wrote:

...
And then I'd wonder about the server side implementation. How will

frontend

...
cache invalidation work? Are we going to need to purge every

individual

...
...
...
article section relative to /w/api.php on edit?

Since the API doesn't require pretty URLs, we could simply append

the

...
...
...
...
current revision ID to the mobileview URLs.

...
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now

individual

...
...
...
...
...
sections in memcached? If so, should we calculate memcached space

needs

...
...
for

...
article text as 3x the current parser cache utilization? More

memcached

...
...
...
usage is great, not asking to dissuade its use but because its

better

...
to

...
...
...
capacity plan than to react.

action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will

result

...
...
...
...
in increased memcached usage.

-- Best regards, Max Semenik ([[User:MaxSem]])

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Jon Robson http://jonrobson.me.uk @rakugojon

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asher Feldman

7:56 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Tuesday, February 12, 2013, Diederik van Liere wrote:

...

...
It does still seem to me that the data to determine secondary api

requests

...
should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per

case

...
1 below. Otherwise, it's a pageview as per case 2. Difficult or

expensive

...
to reconcile? Not when you're doing distributed log analysis via hadoop.

So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D

If you can point me to some examples, I'll see if I can find any insights into the behavior.

...

...
On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards <

arichards@wikimedia.org

...
...
wrote:

...
Thanks, Jon. To try and clarify a bit more about the API requests...

they

...
...
are not made on a per-section basis. As I mentioned earlier, there are

two

...
cases in which article content gets loaded by the API:

Going directly to a page (eg clicking a link from a Google search)

will

...
result in the backend serving a page with ONLY summary section content

and

...
section headers. The rest of the page is lazily loaded via API request

once

...
the JS for the page gets loaded. The idea is to increase responsiveness

by

...
reducing the delay for an article to load (further details in the

article

...
...
Jon previously linked to). The API request looks like:

http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

...
...

Loading an article entirely via Javascript - like when a link is

clicked

...
in an article to another article, or an article is loaded via search.

This

...
will make ONE call to the API to load article content. API request

looks

...
...
like:

http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

...
...
These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted

as a

...
...
'pageview'. You could make the argument that we just count all of these

API

...
requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need

to

...
...
be able to count the traditional page request as a pageview - thus we

need

...
a way to differentiate the types of API requests being made when they otherwise share the same URL.

On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com

wrote:

...
...
...
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one).

Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of

lazy

...
...
...
loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date.

In summary the reason is to

make the app feel more responsive by simply loading content rather

than reloading the entire interface 2) reducing the payload sent to a device.

Session Tracking ################

Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta

or

...
...
...
stable works fine for standard page views.

As for the situations where an entire page is loaded via the api it makes no dif

Asher Feldman

15 Feb 15 Feb

9:16 p.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

Just to tie this thread up - the issue of how to count ajax driven pageviews loaded from the api and of how to differentiate those requests from secondary api page requests has been resolved without the need for code or logging changes.

Tagging of the mobile beta site will be accomplished via a new generic mediawiki http response header dedicated to logging containing key value pairs.

-Asher

On Tue, Feb 12, 2013 at 9:56 AM, Asher Feldman afeldman@wikimedia.orgwrote:

...

On Tuesday, February 12, 2013, Diederik van Liere wrote:

...
...
It does still seem to me that the data to determine secondary api

requests

...
should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per

case

...
1 below. Otherwise, it's a pageview as per case 2. Difficult or

expensive

...
to reconcile? Not when you're doing distributed log analysis via

hadoop.

...
So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D

If you can point me to some examples, I'll see if I can find any insights into the behavior.

...
...
On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards <

arichards@wikimedia.org

...
...
wrote:

...
Thanks, Jon. To try and clarify a bit more about the API requests...

they

...
...
are not made on a per-section basis. As I mentioned earlier, there are

two

...
cases in which article content gets loaded by the API:

Going directly to a page (eg clicking a link from a Google search)

will

...
result in the backend serving a page with ONLY summary section content

and

...
section headers. The rest of the page is lazily loaded via API request

once

...
the JS for the page gets loaded. The idea is to increase

responsiveness

...
by

...
reducing the delay for an article to load (further details in the

article

...
...
Jon previously linked to). The API request looks like:

http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

...
...

Loading an article entirely via Javascript - like when a link is

clicked

...
in an article to another article, or an article is loaded via search.

This

...
will make ONE call to the API to load article content. API request

looks

...
...
like:

http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

...
...
These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted

as a

...
...
'pageview'. You could make the argument that we just count all of

these

...
API

...
requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we

need to

...
...
be able to count the traditional page request as a pageview - thus we

need

...
a way to differentiate the types of API requests being made when they otherwise share the same URL.

On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com

wrote:

...
...
...
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing

so

...
...
...
on this one).

Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of

lazy

...
...
...
loading section content and subsequent pages can be found here [1],

I

...
...
...
just gave it a refresh as it was a little out of date.

In summary the reason is to

make the app feel more responsive by simply loading content

rather

...
...
...
than reloading the entire interface 2) reducing the payload sent to a device.

Session Tracking ################

Going back to the discussion of tracking mobile page views, it

sounds

...
...
...
like a header stating whether a page is being viewed in alpha, beta

or

...
...
...
stable works fine for standard page views.

As for the situations where an entire page is loaded via the api it makes no dif

Diederik van Liere

16 Feb 16 Feb

12:26 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

Thanks Asher for tying this up! I was about to write a similar email :) One final question, just to make sure we are all on the same page: is the X-CS field becoming a generic key/value pair for tracking purposes?

On Fri, Feb 15, 2013 at 11:16 AM, Asher Feldman afeldman@wikimedia.orgwrote:

...

Just to tie this thread up - the issue of how to count ajax driven pageviews loaded from the api and of how to differentiate those requests from secondary api page requests has been resolved without the need for code or logging changes.

Tagging of the mobile beta site will be accomplished via a new generic mediawiki http response header dedicated to logging containing key value pairs.

-Asher

On Tue, Feb 12, 2013 at 9:56 AM, Asher Feldman <afeldman@wikimedia.org

...
wrote:

...
On Tuesday, February 12, 2013, Diederik van Liere wrote:

...
...
It does still seem to me that the data to determine secondary api

requests

...
should already be present in the existing log line. If the value of

the

...
...
...
page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per

case

...
1 below. Otherwise, it's a pageview as per case 2. Difficult or

expensive

...
to reconcile? Not when you're doing distributed log analysis via

hadoop.

...
So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D

If you can point me to some examples, I'll see if I can find any insights into the behavior.

...
...
On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards <

arichards@wikimedia.org

...
...
wrote:

...
Thanks, Jon. To try and clarify a bit more about the API requests...

they

...
...
are not made on a per-section basis. As I mentioned earlier, there

are

...
...
...
two

...
cases in which article content gets loaded by the API:

Going directly to a page (eg clicking a link from a Google

search)

...
...
...
will

...
result in the backend serving a page with ONLY summary section

content

...
...
...
and

...
section headers. The rest of the page is lazily loaded via API

request

...
...
...
once

...
the JS for the page gets loaded. The idea is to increase

responsiveness

...
by

...
reducing the delay for an article to load (further details in the

article

...
...
Jon previously linked to). The API request looks like:

http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

...
...
...
...

Loading an article entirely via Javascript - like when a link is

clicked

...
in an article to another article, or an article is loaded via

search.

...
...
...
This

...
will make ONE call to the API to load article content. API request

looks

...
...
like:

http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...

...
...
...
...
These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted

as a

...
...
'pageview'. You could make the argument that we just count all of

these

...
API

...
requests as pageviews, but there are cases when we can't load

article

...
...
...
...
content from the API (like devices that do not support JS), so we

need to

...
...
be able to count the traditional page request as a pageview - thus

we

...
...
...
need

...
a way to differentiate the types of API requests being made when

they

...
...
...
...
otherwise share the same URL.

On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com

wrote:

...
...
...
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this

further

...
...
...
...
...
let's start another thread as I'm getting extremely confused doing

so

...
...
...
on this one).

Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of

lazy

...
...
...
loading section content and subsequent pages can be found here

[1],

...
...
I

...
...
...
just gave it a refresh as it was a little out of date.

In summary the reason is to

make the app feel more responsive by simply loading content

rather

...
...
...
than reloading the entire interface 2) reducing the payload sent to a device.

Session Tracking ################

Going back to the discussion of tracking mobile page views, it

sounds

...
...
...
like a header stating whether a page is being viewed in alpha,

beta

...
...
or

...
...
...
stable works fine for standard page views.

As for the situations where an entire page is loaded via the api

it

...
...
...
...
...
makes no dif

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Asher Feldman

5 Feb 5 Feb

3:02 a.m.

New subject: [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews

On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards arichards@wikimedia.orgwrote:

...

Asher, I understand your hesitation about using HTTP header fields, but there are a couple problems I'm seeing with using query string parameters. Perhaps you or others have some ideas how to get around these:

We should keep user-facing URLs canonical as much as possible (primarily

for link sharing) ** If we keep user-facing URLs canonical, we could potentially add query string params via javascript, but that would only work on devices that support javascript/have javascript enabled (this might not be a huge deal as we are planning changes such that users that do not support jQuery will get a simplified version of the stable site)

I was thinking of this as a solution for the X-MF-Req header, based on your explanation of it earlier in the the thread: "Almost correct - I realize I didn't actually explain it correctly. This would be a request HTTP header set by the client in API requests made by Javascript provided by MobileFrontend."

I only meant to apply the query string idea to API requests, which can also be marked to indicate non-standard versions of the site. I completely missed the case of non-api requests about which beta/alpha usage data needs to be collected. What about doing so via the eventlog service? Only for users actually opted into one of these programs, no need to log anything special for the majority of users getting the standard site.

* How could this work for the first pageview request (eg a user clicking a

...

link from Google or even just browsing to http://en.wikipedia.org)?

I think this is covered by the above, in that the data intended to go into x-mf-req doesn't apply to this sort of page view, and first views from users opted into a trial can eventlog the trial usage.

rupert THURNER

3 Feb 3 Feb

11:29 a.m.

New subject: Fwd: RFC: Introducing two new HTTP headers to track mobile pageviews

the reason to use two fields instead of one makes it much easier to implement or performant?

On Sun, Feb 3, 2013 at 1:08 AM, David Schoonover dsc@wikimedia.org wrote:

...

Huh! News to me as well. I definitely agree with that decision. Thanks, Ori!

I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode?

Looking especially to hear from Arthur and Matt.

-- David Schoonover dsc@wikimedia.org

On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:

...
Thanks Ori, I was not aware of this D

Sent from my iPhone

On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:

...
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:

...
I don't like it's cryptic nature.

Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».

Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary

But that also means sending more bytes through the wire :S

Well, you can (and should) drop the 'X-' :-)

See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix and

Similar Constructs in Application Protocols

...
-- Ori Livneh

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

4302

Age (days ago)

4317

Last active (days ago)

wikitech-l@lists.wikimedia.org

39 comments

12 participants

tags (0)

participants (12)

Arthur Richards
Asher Feldman
Brion Vibber
David Schoonover
Diederik van Liere
Jon Robson
Mark Bergsma
Max Semenik
Ori Livneh
Platonides
rupert THURNER
Tyler Romeo