(Apologies for cross-posting)
Heya,
The mobile team needs accurate pageviews for the alpha and beta mobile site. Currently, this information is only stored in a cookie, but we don't want to go the route of starting to store this cookie because of cache server performance, network performance and privacy policy issues. The mobile team also needs to be able to diferentiate between initial and secondary API requests - pages in the beta version of MobileFrontend are dynamically loaded via the API, meaning that MobileFrontend will might make multiple API requests to load sections of an article when they are toggled open up by the user. At the moment, we have no way of diferentiating between API requests to determine which one should count as a 'pageview'.
We propose that we set two additional custom HTTP headers - one to identify alpha/beta/stable version of MobileFrontend, the other to be able to diferentiate between initial and secondary API requests. This would make logging the necessary information trivial, and we believe it would be fairly lightweight to implement.
We propose the following two headers with their possible values: X-MF-Mode: a/b/s (alpha/beta/stable) X-MF-Req: 1/2 (primary/secondary)
X-MF-Mode would be determined by Varnish based off the existence of the alpha/beta identifying cookies while X-MF-Req would be set by MobileFrontend in the backend response.
These headers would only be set on the Varnish servers, on the Squids/Nginx we will just set a dash ('-') in the log fields.
Questions: 1) Are there objections to the introduction of these two http headers? 2) We would like to aim for a late February deployment, is that an okay period? (We will announce the real deployment date as well) 3) Are we missing anything important?
Thanks for your feedback!
Best Arthur & Diederik
I don't like it's cryptic nature.
Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».
Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary
But that also means sending more bytes through the wire :S
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
I don't like it's cryptic nature.
Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».
Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary
But that also means sending more bytes through the wire :S
Well, you can (and should) drop the 'X-' :-)
See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix and Similar Constructs in Application Protocols
-- Ori Livneh
Thanks Ori, I was not aware of this D
Sent from my iPhone
On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
I don't like it's cryptic nature.
Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».
Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary
But that also means sending more bytes through the wire :S
Well, you can (and should) drop the 'X-' :-)
See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix and Similar Constructs in Application Protocols
-- Ori Livneh
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Huh! News to me as well. I definitely agree with that decision. Thanks, Ori!
I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode?
Looking especially to hear from Arthur and Matt.
-- David Schoonover dsc@wikimedia.org
On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
Thanks Ori, I was not aware of this D
Sent from my iPhone
On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
I don't like it's cryptic nature.
Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».
Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary
But that also means sending more bytes through the wire :S
Well, you can (and should) drop the 'X-' :-)
See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix and
Similar Constructs in Application Protocols
-- Ori Livneh
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing this in request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal.
On Sunday, February 3, 2013, David Schoonover wrote:
Huh! News to me as well. I definitely agree with that decision. Thanks, Ori!
I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode?
Looking especially to hear from Arthur and Matt.
-- David Schoonover dsc@wikimedia.org javascript:;
On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere <dvanliere@wikimedia.org javascript:;>wrote:
Thanks Ori, I was not aware of this D
Sent from my iPhone
On 2013-02-02, at 16:55, Ori Livneh <ori@wikimedia.org javascript:;>
wrote:
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
I don't like it's cryptic nature.
Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».
Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary
But that also means sending more bytes through the wire :S
Well, you can (and should) drop the 'X-' :-)
See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix
and
Similar Constructs in Application Protocols
-- Ori Livneh
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Regarding varnish cacheability of mobile API requests with a logging query param - it would probably be worth making frontend varnishes strip out all occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name that can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.
&l=mft2&l=mfstable etc.
So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across unrelated systems like text squids or image caches.
On Sunday, February 3, 2013, Asher Feldman wrote:
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing this in request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal.
On Sunday, February 3, 2013, David Schoonover wrote:
Huh! News to me as well. I definitely agree with that decision. Thanks, Ori!
I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode?
Looking especially to hear from Arthur and Matt.
-- David Schoonover dsc@wikimedia.org
On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
Thanks Ori, I was not aware of this D
Sent from my iPhone
On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
I don't like it's cryptic nature.
Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».
Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary
But that also means sending more bytes through the wire :S
Well, you can (and should) drop the 'X-' :-)
See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix
and
Similar Constructs in Application Protocols
-- Ori Livneh
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Considering that the query component of a URI is meant to identify the resource whereas HTTP headers are meant to tell the server additional information about the request, I think a header approach is much more appropriate than a no-op query parameter.
If the X- is removed, I'd have no problem with the addition of these headers, but what is the advantage of having two over one. Wouldn't a header like: MobileFrontend: 1/2 a/b/s work just as fine?
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman afeldman@wikimedia.orgwrote:
Regarding varnish cacheability of mobile API requests with a logging query param - it would probably be worth making frontend varnishes strip out all occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name that can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.
&l=mft2&l=mfstable etc.
So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across unrelated systems like text squids or image caches.
On Sunday, February 3, 2013, Asher Feldman wrote:
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing this
in
request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal.
On Sunday, February 3, 2013, David Schoonover wrote:
Huh! News to me as well. I definitely agree with that decision. Thanks, Ori!
I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode?
Looking especially to hear from Arthur and Matt.
-- David Schoonover dsc@wikimedia.org
On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
Thanks Ori, I was not aware of this D
Sent from my iPhone
On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
I don't like it's cryptic nature.
Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».
Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary
But that also means sending more bytes through the wire :S
Well, you can (and should) drop the 'X-' :-)
See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix
and
Similar Constructs in Application Protocols
-- Ori Livneh
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That's not at all true in the real world. Look at the actual requests for google analytics on a high percentage of sites, etc.
Setting new request headers for mobile that map to new inflexible fields in the log stream that must be set on all non mobile requests ("\t-\t-") equals gigabytes of unnecessarily log data every day (that we want to save 100% of) for no good reason. Wanting to keep query params "pure" isn't a good reason.
On Sunday, February 3, 2013, Tyler Romeo wrote:
Considering that the query component of a URI is meant to identify the resource whereas HTTP headers are meant to tell the server additional information about the request, I think a header approach is much more appropriate than a no-op query parameter.
If the X- is removed, I'd have no problem with the addition of these headers, but what is the advantage of having two over one. Wouldn't a header like: MobileFrontend: 1/2 a/b/s work just as fine?
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:;
On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman <afeldman@wikimedia.orgjavascript:;
wrote:
Regarding varnish cacheability of mobile API requests with a logging
query
param - it would probably be worth making frontend varnishes strip out
all
occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name
that
can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.
&l=mft2&l=mfstable etc.
So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across
unrelated
systems like text squids or image caches.
On Sunday, February 3, 2013, Asher Feldman wrote:
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing
this
in
request headers and altering edge config is unnecessary and a bad
design
pattern. On the analytics side, if parsing query params seems
challenging
vs. having a fixed field to parse, deal.
On Sunday, February 3, 2013, David Schoonover wrote:
Huh! News to me as well. I definitely agree with that decision.
Thanks,
Ori!
I've already written the Varnish code for setting X-MF-Mode so it can
be
captured by varnishncsa. Is there agreement to switch to Mobile-Mode,
or
at least, MF-Mode?
Looking especially to hear from Arthur and Matt.
-- David Schoonover dsc@wikimedia.org
On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
Thanks Ori, I was not aware of this D
Sent from my iPhone
On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
> I don't like it's cryptic nature. > > Someone looking at the headers sent to his browser would be very > confused about what's the point of «X-MF-Mode: b». > > Instead something like this would be much more descriptive: > X-Mobile-Mode: stable > X-Mobile-Request: secondary > > But that also means sending more bytes through the wire :S Well, you can (and should) drop the 'X-' :-)
See http://tools.ietf.org/html/rfc6648: Deprecating the "X-"
Prefix
and
Similar Constructs in Application Protocols
-- Ori Livneh
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Remind me again why a production setup is logging every header of every request? Also, if you are logging every header, then the amount of data added by a single extra header would be insignificant compared to the rest of the request.
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Sun, Feb 3, 2013 at 5:12 AM, Asher Feldman afeldman@wikimedia.orgwrote:
That's not at all true in the real world. Look at the actual requests for google analytics on a high percentage of sites, etc.
Setting new request headers for mobile that map to new inflexible fields in the log stream that must be set on all non mobile requests ("\t-\t-") equals gigabytes of unnecessarily log data every day (that we want to save 100% of) for no good reason. Wanting to keep query params "pure" isn't a good reason.
On Sunday, February 3, 2013, Tyler Romeo wrote:
Considering that the query component of a URI is meant to identify the resource whereas HTTP headers are meant to tell the server additional information about the request, I think a header approach is much more appropriate than a no-op query parameter.
If the X- is removed, I'd have no problem with the addition of these headers, but what is the advantage of having two over one. Wouldn't a header like: MobileFrontend: 1/2 a/b/s work just as fine?
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:;
On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman <afeldman@wikimedia.org
wrote:
Regarding varnish cacheability of mobile API requests with a logging
query
param - it would probably be worth making frontend varnishes strip out
all
occurrences of that query param and its value from their backend
requests
so they're all the same to the caching instances. A generic param name
that
can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.
&l=mft2&l=mfstable etc.
So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across
unrelated
systems like text squids or image caches.
On Sunday, February 3, 2013, Asher Feldman wrote:
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing
this
in
request headers and altering edge config is unnecessary and a bad
design
pattern. On the analytics side, if parsing query params seems
challenging
vs. having a fixed field to parse, deal.
On Sunday, February 3, 2013, David Schoonover wrote:
Huh! News to me as well. I definitely agree with that decision.
Thanks,
Ori!
I've already written the Varnish code for setting X-MF-Mode so it
can
be
captured by varnishncsa. Is there agreement to switch to
Mobile-Mode,
or
at least, MF-Mode?
Looking especially to hear from Arthur and Matt.
-- David Schoonover dsc@wikimedia.org
On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
Thanks Ori, I was not aware of this D
Sent from my iPhone
On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:
> > > On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote: > >> I don't like it's cryptic nature. >> >> Someone looking at the headers sent to his browser would be
very
>> confused about what's the point of «X-MF-Mode: b». >> >> Instead something like this would be much more descriptive: >> X-Mobile-Mode: stable >> X-Mobile-Request: secondary >> >> But that also means sending more bytes through the wire :S > Well, you can (and should) drop the 'X-' :-) > > See http://tools.ietf.org/html/rfc6648: Deprecating the "X-"
Prefix
and
Similar Constructs in Application Protocols > > > -- > Ori Livneh > > > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Sunday, February 3, 2013, Tyler Romeo wrote:
Remind me again why a production setup is logging every header of every request?
That's ludicrous. Please reread our udplog format documentation and this entire thread carefully, especially the first message before commenting any further.
Also, if you are logging every header, then the amount of data added by a single extra header would be insignificant compared to the rest of the request.
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:;
On Sun, Feb 3, 2013 at 5:12 AM, Asher Feldman <afeldman@wikimedia.orgjavascript:;
wrote:
That's not at all true in the real world. Look at the actual requests for google analytics on a high percentage of sites, etc.
Setting new request headers for mobile that map to new inflexible fields
in
the log stream that must be set on all non mobile requests ("\t-\t-") equals gigabytes of unnecessarily log data every day (that we want to save 100% of) for no good reason. Wanting to keep query params "pure" isn't a good reason.
On Sunday, February 3, 2013, Tyler Romeo wrote:
Considering that the query component of a URI is meant to identify the resource whereas HTTP headers are meant to tell the server additional information about the request, I think a header approach is much more appropriate than a no-op query parameter.
If the X- is removed, I'd have no problem with the addition of these headers, but what is the advantage of having two over one. Wouldn't a header like: MobileFrontend: 1/2 a/b/s work just as fine?
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com javascript:;javascript:;
On Sun, Feb 3, 2013 at 4:35 AM, Asher Feldman <afeldman@wikimedia.orgjavascript:;
wrote:
Regarding varnish cacheability of mobile API requests with a logging
query
param - it would probably be worth making frontend varnishes strip
out
all
occurrences of that query param and its value from their backend
requests
so they're all the same to the caching instances. A generic param
name
that
can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.
&l=mft2&l=mfstable etc.
So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across
unrelated
systems like text squids or image caches.
On Sunday, February 3, 2013, Asher Feldman wrote:
If you want to differentiate categories of API requests in logs,
add
descriptive noop query params to the requests. I.e &mfmode=2. Doing
this
in
request headers and altering edge config is unnecessary and a bad
design
pattern. On the analytics side, if parsing query params seems
challenging
vs. having a fixed field to parse, deal.
On Sunday, February 3, 2013, David Schoonover wrote:
Huh! News to me as well. I definitely agree with that decision.
Thanks,
Ori!
I've already written the Varnish code for setting X-MF-Mode so it
can
be
captured by varnishncsa. Is there agreement to switch to
Mobile-Mode,
or
at least, MF-Mode?
Looking especially to hear from Arthur and Matt.
-- David Schoonover dsc@wikimedia.org
On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
> Thanks Ori, I was not aware of this > D > > Sent from my iPhone > > On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote: > > > > > > > On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote: > > > >> I don't like it's cryptic nature. > >> > >> Someone looking at the headers sent to his browser would be
very
> >> confused about what's the point of «X-MF-Mode: b». > >> > >> Instead something like this would be much more descriptive: > >> X-Mobile-Mode: stable > >> X-Mobile-Request: secondary > >> > >> But that also means sending more bytes through the wire :S > > Well, you can (and should) drop the 'X-' :-) > > > > See http://tools.ietf.org/html/rfc6648: Deprecating the "X-"
Prefix
and > Similar Constructs in Application Protocols > > > > > > -- > > Ori Livneh > > > >
On Sun, Feb 3, 2013 at 2:35 AM, Asher Feldman afeldman@wikimedia.orgwrote:
Regarding varnish cacheability of mobile API requests with a logging query param - it would probably be worth making frontend varnishes strip out all occurrences of that query param and its value from their backend requests so they're all the same to the caching instances. A generic param name that can take any value would allow for adding as many extra log values as needed, limited only by the uri log field length.
&l=mft2&l=mfstable etc.
So still an edge cache change but the result is more flexible while avoiding changing the fixed field length log format across unrelated systems like text squids or image caches.
On Sunday, February 3, 2013, Asher Feldman wrote:
If you want to differentiate categories of API requests in logs, add descriptive noop query params to the requests. I.e &mfmode=2. Doing this
in
request headers and altering edge config is unnecessary and a bad design pattern. On the analytics side, if parsing query params seems challenging vs. having a fixed field to parse, deal.
Asher, I understand your hesitation about using HTTP header fields, but there are a couple problems I'm seeing with using query string parameters. Perhaps you or others have some ideas how to get around these: * We should keep user-facing URLs canonical as much as possible (primarily for link sharing) ** If we keep user-facing URLs canonical, we could potentially add query string params via javascript, but that would only work on devices that support javascript/have javascript enabled (this might not be a huge deal as we are planning changes such that users that do not support jQuery will get a simplified version of the stable site) * How could this work for the first pageview request (eg a user clicking a link from Google or even just browsing to http://en.wikipedia.org)?
I may be missing other potential problems - it would be great if others from the mobile team could chime in.
On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards arichards@wikimedia.orgwrote:
Asher, I understand your hesitation about using HTTP header fields, but there are a couple problems I'm seeing with using query string parameters. Perhaps you or others have some ideas how to get around these:
- We should keep user-facing URLs canonical as much as possible (primarily
for link sharing) ** If we keep user-facing URLs canonical, we could potentially add query string params via javascript, but that would only work on devices that support javascript/have javascript enabled (this might not be a huge deal as we are planning changes such that users that do not support jQuery will get a simplified version of the stable site)
* How could this work for the first pageview request (eg a user clicking a
link from Google or even just browsing to http://en.wikipedia.org)?
I think mainly we need the tracking on the API requests... that's all JavaScript-initiated, and all hidden from the user. The main problem with adding parameters would be for caching .... but none of the API hits are currently cacheable so that's not an immediate issue perhaps.
-- brion
On Mon, Feb 4, 2013 at 5:30 PM, Brion Vibber brion@pobox.com wrote:
On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards <arichards@wikimedia.org
wrote:
Asher, I understand your hesitation about using HTTP header fields, but there are a couple problems I'm seeing with using query string
parameters.
Perhaps you or others have some ideas how to get around these:
- We should keep user-facing URLs canonical as much as possible
(primarily
for link sharing) ** If we keep user-facing URLs canonical, we could potentially add query string params via javascript, but that would only work on devices that support javascript/have javascript enabled (this might not be a huge deal as we are planning changes such that users that do not support jQuery
will
get a simplified version of the stable site)
- How could this work for the first pageview request (eg a user clicking a
link from Google or even just browsing to http://en.wikipedia.org)?
I think mainly we need the tracking on the API requests... that's all JavaScript-initiated, and all hidden from the user. The main problem with adding parameters would be for caching .... but none of the API hits are currently cacheable so that's not an immediate issue perhaps.
We also need to be able to differentiate between alpha/beta/stable versions of the mobile site, without having to parse the cookie header (I believe as a result of performance constraints around this? I think the analytics team had looked into this previously).
On Mon, Feb 4, 2013 at 4:38 PM, Arthur Richards arichards@wikimedia.orgwrote:
On Mon, Feb 4, 2013 at 5:30 PM, Brion Vibber brion@pobox.com wrote:
On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards <arichards@wikimedia.org
wrote:
- How could this work for the first pageview request (eg a user clicking
a
link from Google or even just browsing to http://en.wikipedia.org)?
I think mainly we need the tracking on the API requests... that's all JavaScript-initiated, and all hidden from the user. The main problem with adding parameters would be for caching .... but none of the API hits are currently cacheable so that's not an immediate issue perhaps.
We also need to be able to differentiate between alpha/beta/stable versions of the mobile site, without having to parse the cookie header (I believe as a result of performance constraints around this? I think the analytics team had looked into this previously).
Yeah that's.... probably not possible if you want to track that for initial page views. Cookie's the only thing guaranteed to have the data available, and we have no way to inject a header into mobile web browsers except for the XHR hits to the API.
-- brion
On Mon, Feb 4, 2013 at 5:49 PM, Brion Vibber brion@pobox.com wrote:
On Mon, Feb 4, 2013 at 4:38 PM, Arthur Richards <arichards@wikimedia.org
wrote:
On Mon, Feb 4, 2013 at 5:30 PM, Brion Vibber brion@pobox.com wrote:
On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards <
arichards@wikimedia.org
wrote:
- How could this work for the first pageview request (eg a user
clicking
a
link from Google or even just browsing to http://en.wikipedia.org)?
I think mainly we need the tracking on the API requests... that's all JavaScript-initiated, and all hidden from the user. The main problem
with
adding parameters would be for caching .... but none of the API hits
are
currently cacheable so that's not an immediate issue perhaps.
We also need to be able to differentiate between alpha/beta/stable
versions
of the mobile site, without having to parse the cookie header (I believe
as
a result of performance constraints around this? I think the analytics
team
had looked into this previously).
Yeah that's.... probably not possible if you want to track that for initial page views. Cookie's the only thing guaranteed to have the data available, and we have no way to inject a header into mobile web browsers except for the XHR hits to the API.
-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
In the case of the cookie, the header would actually get set by the backend response (from Apache) and I believe Dave cooked up or was planning on cooking some magic to somehow make that information discernable when results are cached.
On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards arichards@wikimedia.orgwrote:
In the case of the cookie, the header would actually get set by the backend response (from Apache) and I believe Dave cooked up or was planning on cooking some magic to somehow make that information discernable when results are cached.
Opting into the mobile beta as it is currently implemented bypasses varnish caching for all future mobile pageviews for the life of the cookie. So this probably isn't quite right (at least the "when results are cached" part.)
On Mon, Feb 4, 2013 at 5:21 PM, Asher Feldman afeldman@wikimedia.orgwrote:
On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards arichards@wikimedia.orgwrote:
In the case of the cookie, the header would actually get set by the backend response (from Apache) and I believe Dave cooked up or was planning on cooking some magic to somehow make that information discernable when results are cached.
Opting into the mobile beta as it is currently implemented bypasses varnish caching for all future mobile pageviews for the life of the cookie. So this probably isn't quite right (at least the "when results are cached" part.)
Thinking about this further.. So long as all beta optins bypass all caching and always have to hit an apache, it would be fine for mf to set a response header reflecting the version of the site the optin cookie triggers (but only if there's an optin, avoid setting on standard.) I'd just prefer this to be logged without adding a field to the entire udplog stream that will generally just be wasted space. Mobile already has one dedicated udplog field currently intended for zero carriers, wasted log space for nearly every request. Make it a key/value field that can contain multiple keys, i.e. "zc:orn;v:b1" (zero carrier = orange whatever, version = beta1)
If by some chance mobile beta gets implemented in a way that doesn't kill frontend caching for its users (maybe solely via different js behavior based on the presence of the optin cookie?) the above won't be applicable anymore, so using the event log facility / pixel service to note beta usage becomes more appropriate. If beta usage is going to be driven upwards, I hope this approach is seriously considered. Mobile currently only has around a 58% edge cache hitrate as it is and it sounds like upcoming features will place significant new demands on the apaches and for memcached space. If a non cache busting beta site is doable, go for the logging method now that will later be compatible with it to avoid having to change processing methods.
On Mon, Feb 4, 2013 at 7:12 PM, Asher Feldman afeldman@wikimedia.orgwrote:
On Mon, Feb 4, 2013 at 5:21 PM, Asher Feldman <afeldman@wikimedia.org
wrote:
On Mon, Feb 4, 2013 at 4:59 PM, Arthur Richards <arichards@wikimedia.org wrote:
In the case of the cookie, the header would actually get set by the backend response (from Apache) and I believe Dave cooked up or was planning on cooking some magic to somehow make that information discernable when results are cached.
Opting into the mobile beta as it is currently implemented bypasses varnish caching for all future mobile pageviews for the life of the cookie. So this probably isn't quite right (at least the "when results
are
cached" part.)
Thinking about this further.. So long as all beta optins bypass all caching and always have to hit an apache, it would be fine for mf to set a response header reflecting the version of the site the optin cookie triggers (but only if there's an optin, avoid setting on standard.) I'd just prefer this to be logged without adding a field to the entire udplog stream that will generally just be wasted space. Mobile already has one dedicated udplog field currently intended for zero carriers, wasted log space for nearly every request. Make it a key/value field that can contain multiple keys, i.e. "zc:orn;v:b1" (zero carrier = orange whatever, version = beta1)
If by some chance mobile beta gets implemented in a way that doesn't kill frontend caching for its users (maybe solely via different js behavior based on the presence of the optin cookie?) the above won't be applicable anymore, so using the event log facility / pixel service to note beta usage becomes more appropriate. If beta usage is going to be driven upwards, I hope this approach is seriously considered. Mobile currently only has around a 58% edge cache hitrate as it is and it sounds like upcoming features will place significant new demands on the apaches and for memcached space. If a non cache busting beta site is doable, go for the logging method now that will later be compatible with it to avoid having to change processing methods. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
OK - this is all making a lot more sense to me now, thanks for your clarifications and suggestions, Asher.
So, from the mobile team's perspective a straightforward implementation to get us to our goal might be to: 1) add a query parameter to identify 'secondary' API hits (eg an API request for page content made after an initial request for that page was made, all other requests stay the same) 2) use the header solution to identify beta/alpha cookies (HTTP header set by backend response when user is opted in).
One thing I'd like to double check though is that 'Opting into the mobile beta as it is currently implemented bypasses varnish caching for all future mobile pageviews for the life of the cookie' - I thought the Varnish cache was just varied by the optin cookies, not totally bypassed. I've looked at headers from some sample requests I've made with the beta opt-in and I'm not seeing any cache hits, so I gather you are correct. Can you please confirm this?
Analytics folks, is this workable from your perspective?
Analytics folks, is this workable from your perspective?
Yes, this works fine for us and it's also no problem to set multiple
key/value pairs in the http header that we are now using for the X-CS header. Diederik
Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit.
*1. X-MF-Mode: Alpha/Beta Site Usage* * * We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish. This will avoid an explosion of cryptic headers for analytic purposes.
Questions: - It seems there's some confusion around "bypassing Varnish". If I understand correctly, it's not that Varnish is ever bypassed, just that the upstream response is not cached if cookies are present. Is that right? - Since we're repurposing X-CS, should we perhaps rename it to something more apt to address concerns about cryptic non-standard headers flying about?
*2. X-MF-Req: Primary vs Secondary API Requests*
This header will be replaced with a query parameter set by the client-side JS code making the request. Analytics will parse it out at processing time and Do The Right Thing.
Kindly correct me if I've gotten anything wrong.
-- David Schoonover dsc@wikimedia.org
On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
Analytics folks, is this workable from your perspective?
Yes, this works fine for us and it's also no problem to set multiple
key/value pairs in the http header that we are now using for the X-CS header. Diederik _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wednesday, February 6, 2013, David Schoonover wrote:
Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit.
*1. X-MF-Mode: Alpha/Beta Site Usage*
We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish.
Nope. There will be a header denoting non-standard MobileFrontend views if the mobile team wants to leave the caching situation as is. It will be a response header set by mediawiki, not varnish. The header will have a unique name, it will not share the name of the zero carrier header. The udplog field that currently only ever contains carrier information on zero requests will become a key value field. Udplog fields are not named, they are positional.
This will avoid an explosion of cryptic headers for analytic purposes.
Questions:
- It seems there's some confusion around "bypassing Varnish". If I
understand correctly, it's not that Varnish is ever bypassed, just that the upstream response is not cached if cookies are present. Is that right?
"Bypasses varnish caching" != "bypassing varnish." I don't see any use of the later in this thread, but if there has been confusion, know that all m.wikipedia.org requests are served via varnish.
- Since we're repurposing X-CS, should we perhaps rename it to something
more apt to address concerns about cryptic non-standard headers flying about?
Nope.. We're repurposing the fixed position udplog field, not the zero carrier code header.
*2. X-MF-Req: Primary vs Secondary API Requests*
This header will be replaced with a query parameter set by the client-side JS code making the request. Analytics will parse it out at processing time and Do The Right Thing.
Kindly correct me if I've gotten anything wrong.
-- David Schoonover dsc@wikimedia.org
On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
Analytics folks, is this workable from your perspective?
Yes, this works fine for us and it's also no problem to set multiple
key/value pairs in the http header that we are now using for the X-CS header. Diederik _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That all sounds fine to me so long as we're all agreed.
-- David Schoonover dsc@wikimedia.org
On Wed, Feb 6, 2013 at 12:59 PM, Asher Feldman afeldman@wikimedia.orgwrote:
On Wednesday, February 6, 2013, David Schoonover wrote:
Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit.
*1. X-MF-Mode: Alpha/Beta Site Usage*
We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish.
Nope. There will be a header denoting non-standard MobileFrontend views if the mobile team wants to leave the caching situation as is. It will be a response header set by mediawiki, not varnish. The header will have a unique name, it will not share the name of the zero carrier header. The udplog field that currently only ever contains carrier information on zero requests will become a key value field. Udplog fields are not named, they are positional.
This will avoid an explosion of cryptic headers for analytic purposes.
Questions:
- It seems there's some confusion around "bypassing Varnish". If I
understand correctly, it's not that Varnish is ever bypassed, just that
the
upstream response is not cached if cookies are present. Is that right?
"Bypasses varnish caching" != "bypassing varnish." I don't see any use of the later in this thread, but if there has been confusion, know that all m.wikipedia.org requests are served via varnish.
- Since we're repurposing X-CS, should we perhaps rename it to something
more apt to address concerns about cryptic non-standard headers flying about?
Nope.. We're repurposing the fixed position udplog field, not the zero carrier code header.
*2. X-MF-Req: Primary vs Secondary API Requests*
This header will be replaced with a query parameter set by the
client-side
JS code making the request. Analytics will parse it out at processing
time
and Do The Right Thing.
Kindly correct me if I've gotten anything wrong.
-- David Schoonover dsc@wikimedia.org
On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
Analytics folks, is this workable from your perspective?
Yes, this works fine for us and it's also no problem to set multiple
key/value pairs in the http header that we are now using for the X-CS header. Diederik _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wednesday, February 6, 2013, David Schoonover wrote:
That all sounds fine to me so long as we're all agreed.
Lol. RFC closed.
-- David Schoonover dsc@wikimedia.org javascript:;
On Wed, Feb 6, 2013 at 12:59 PM, Asher Feldman <afeldman@wikimedia.orgjavascript:;
wrote:
On Wednesday, February 6, 2013, David Schoonover wrote:
Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit.
*1. X-MF-Mode: Alpha/Beta Site Usage*
We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish.
Nope. There will be a header denoting non-standard MobileFrontend views
if
the mobile team wants to leave the caching situation as is. It will be a response header set by mediawiki, not varnish. The header will have a unique name, it will not share the name of the zero carrier header. The udplog field that currently only ever contains carrier information on
zero
requests will become a key value field. Udplog fields are not named, they are positional.
This will avoid an explosion of cryptic headers for analytic purposes.
Questions:
- It seems there's some confusion around "bypassing Varnish". If I
understand correctly, it's not that Varnish is ever bypassed, just that
the
upstream response is not cached if cookies are present. Is that right?
"Bypasses varnish caching" != "bypassing varnish." I don't see any use
of
the later in this thread, but if there has been confusion, know that all m.wikipedia.org requests are served via varnish.
- Since we're repurposing X-CS, should we perhaps rename it to
something
more apt to address concerns about cryptic non-standard headers flying about?
Nope.. We're repurposing the fixed position udplog field, not the zero carrier code header.
*2. X-MF-Req: Primary vs Secondary API Requests*
This header will be replaced with a query parameter set by the
client-side
JS code making the request. Analytics will parse it out at processing
time
and Do The Right Thing.
Kindly correct me if I've gotten anything wrong.
-- David Schoonover dsc@wikimedia.org javascript:;
On Tue, Feb 5, 2013 at 2:36 PM, Diederik van Liere <dvanliere@wikimedia.org javascript:;>wrote:
Analytics folks, is this workable from your perspective?
Yes, this works fine for us and it's also no problem to set
multiple
key/value pairs in the http header that we are now using for the X-CS header. Diederik _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Feb 6, 2013, at 9:32 PM, David Schoonover dsc@wikimedia.org wrote:
Just want to summarize and make sure I've got the right conclusions, as this thread has wandered a bit.
*1. X-MF-Mode: Alpha/Beta Site Usage*
We'll roll this into the X-CS header, which will now be KV-pairs (using normal URL encoding), and set by Varnish. This will avoid an explosion of cryptic headers for analytic purposes.
Questions:
- It seems there's some confusion around "bypassing Varnish". If I
understand correctly, it's not that Varnish is ever bypassed, just that the upstream response is not cached if cookies are present. Is that right?
Yes
- Since we're repurposing X-CS, should we perhaps rename it to something
more apt to address concerns about cryptic non-standard headers flying about?
I'd like to propose to define *one* request header to be used for all analytics purposes. It can be key/value pairs, and be set client side where applicable. Varnish can append to it where needed, later keys overriding earlier ones. Then we can log that one header across all HTTP/caching clusters without having to change the log stream all the time, and without wasting much space, and caching edge configuration changes are kept to a minimum as well.
And we might as well be transparent in its naming. header name "Log-Parameters:"?
*2. X-MF-Req: Primary vs Secondary API Requests*
This header will be replaced with a query parameter set by the client-side JS code making the request. Analytics will parse it out at processing time and Do The Right Thing.
I think the question of using a URL param vs a request header should mainly take into account whether the response varies on the value of the parameter. If the responses are otherwise identical, and the value is only used for analytics purposes, I would prefer to put that into the above header instead, as it will impair cacheability / cache size otherwise (even if those requests are currently not cacheable for other reasons). If the responses are actually different based on this parameter, I would prefer to have it in the URL where possible.
I'd like to propose to define *one* request header to be used for all analytics purposes. It can be key/value pairs, and be set client side where applicable. Varnish can append to it where needed, later keys overriding earlier ones. Then we can log that one header across all HTTP/caching clusters without having to change the log stream all the time, and without wasting much space, and caching edge configuration changes are kept to a minimum as well.
Agreed. Instrumentation should ideally never get in the way of production performance, so if we can cut or optimize header use for logging without being too onerous, we'll happily do so. afaik, the reasons that custom HTTP headers are used at all are: - They're accessible from varnishncsa without code modifications; - Varnish and/or other parties in the request chain can munge the values prior to logging to save bytes (examples being X-CS, which replaces the semantic carrier name with a [vastly shorter] numeric code, and the proposed X-MF-Mode header, which prevents the need to log the whole cookies header for post-processing).
Ideally, none of this should need to make a trip to the client. I don't recall seeing anything in the Varnish docs providing a way to send values exclusively to the loggers, but if there is, that's an easy win, and it wouldn't require any changes to our parsing pipeline.
If that's not possible, it makes sense to collapse various headers into a KV field; that would require changes on our side, including all downstream consumers of the log stream (which is surprisingly large), so it's not a trivial move.
-- David Schoonover dsc@wikimedia.org
On Thu, Feb 7, 2013 at 4:32 AM, Mark Bergsma mark@wikimedia.org wrote:
- Since we're repurposing X-CS, should we perhaps rename it to something
more apt to address concerns about cryptic non-standard headers flying about?
I'd like to propose to define *one* request header to be used for all analytics purposes. It can be key/value pairs, and be set client side where applicable.
There's been some confusion in this thread between headers used by mediawiki in determining content generation or for cache variance, and those intended only for logging. The zero carrier header is used by the zero extension to return specific content banners and set different default behaviors (i.e. hide all images) as negotiated with individual mobile carriers. A reader familiar with this might note that their are separate X-CS and X-Carrier headers but X-Carrier is supposed to go away now.
Agreed that there should be a single header for content that's strictly for analytics purposes. All changes to the udplog format in the last year or so could likely be reverted except for the delimiter change, with a multipurpose analytics key/value field added for all else.
I think the question of using a URL param vs a request header should mainly take into account whether the response varies on the value of the parameter. If the responses are otherwise identical, and the value is only used for analytics purposes, I would prefer to put that into the above header instead, as it will impair cacheability / cache size otherwise (even if those requests are currently not cacheable for other reasons). If the responses are actually different based on this parameter, I would prefer to have it in the URL where possible.
For this particular case, the API requests are for either getting specific sections of an article as opposed to either the whole thing, or the first section as part of an initial pageview. I might not have grokked the original RFC email well, but I don't understand why this was being discussed as a logging challenge or necessitating a request header. A mobile api request to just get section 3 of the article on otters should already utilize a query param denoting that section 3 is being fetched, and is already clearly not a "primary" request.
Whether or not it makes sense for mobile to move in the direction of splitting up article views into many api requests is something I'd love to see backed up by data. I'm skeptical for multiple reasons.
On Feb 9, 2013, at 11:21 PM, Asher Feldman afeldman@wikimedia.org wrote:
For this particular case, the API requests are for either getting specific sections of an article as opposed to either the whole thing, or the first section as part of an initial pageview. I might not have grokked the original RFC email well, but I don't understand why this was being discussed as a logging challenge or necessitating a request header. A mobile api request to just get section 3 of the article on otters should already utilize a query param denoting that section 3 is being fetched, and is already clearly not a "primary" request.
Yes, that part remains a bit unclear to me as well - some more details would be welcome.
Whether or not it makes sense for mobile to move in the direction of splitting up article views into many api requests is something I'd love to see backed up by data. I'm skeptical for multiple reasons.
What is the main motivation used here? Reducing article sizes/transfers at the expense of more latency?
On Monday, February 11, 2013, Mark Bergsma wrote:
On Feb 9, 2013, at 11:21 PM, Asher Feldman <afeldman@wikimedia.orgjavascript:;> wrote:
Whether or not it makes sense for mobile to move in the direction of splitting up article views into many api requests is something I'd love
to
see backed up by data. I'm skeptical for multiple reasons.
What is the main motivation used here? Reducing article sizes/transfers at the expense of more latency?
In cases where most sections (probably not even all) are loaded, I'd expect it to increase the amount of data transfered beyond just the overhead of the additional requests. gzip might take a 30k article down to 4k but will be less efficient on individual sections. Text compresses really well, and roundtrip latency is high on many cell networks.
And then I'd wonder about the server side implementation. How will frontend cache invalidation work? Are we going to need to purge every individual article section relative to /w/api.php on edit? Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space needs for article text as 3x the current parser cache utilization? More memcached usage is great, not asking to dissuade its use but because its better to capacity plan than to react.
On 11.02.2013, 22:11 Asher wrote:
And then I'd wonder about the server side implementation. How will frontend cache invalidation work? Are we going to need to purge every individual article section relative to /w/api.php on edit?
Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs.
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space needs for article text as 3x the current parser cache utilization? More memcached usage is great, not asking to dissuade its use but because its better to capacity plan than to react.
action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage.
Max - good answers re: caching concerns. That leaves studying if the bytes transferred on average mobile article view increases or decreases with lazy section loading. If it increases, I'd say this isn't a positive direction to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview, and the impact on backend apache utilization which I'd expect to be > 0.
Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis.
On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.wiki@gmail.com wrote:
On 11.02.2013, 22:11 Asher wrote:
And then I'd wonder about the server side implementation. How will
frontend
cache invalidation work? Are we going to need to purge every individual article section relative to /w/api.php on edit?
Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs.
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space needs
for
article text as 3x the current parser cache utilization? More memcached usage is great, not asking to dissuade its use but because its better to capacity plan than to react.
action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage.
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one).
Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date.
In summary the reason is to 1) make the app feel more responsive by simply loading content rather than reloading the entire interface 2) reducing the payload sent to a device.
Session Tracking ################
Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views.
As for the situations where an entire page is loaded via the api it makes no difference to us to whether we 1) send the same header (set via javascript) or 2) add a query string parameter.
The only advantage I can see of using a header is that an initial page load of the article San Francisco currently uses the same api url as a page load of the article San Francisco via javascript (e.g. I click a link to 'San Francisco' on the California article).
In this new method they would use different urls (as the data sent is different). I'm not sure how that would effect caching.
Let us know which method is preferred. From my perspective implementation of either is easy.
[1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections
On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeldman@wikimedia.org wrote:
Max - good answers re: caching concerns. That leaves studying if the bytes transferred on average mobile article view increases or decreases with lazy section loading. If it increases, I'd say this isn't a positive direction to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview, and the impact on backend apache utilization which I'd expect to be > 0.
Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis.
On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.wiki@gmail.com wrote:
On 11.02.2013, 22:11 Asher wrote:
And then I'd wonder about the server side implementation. How will
frontend
cache invalidation work? Are we going to need to purge every individual article section relative to /w/api.php on edit?
Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs.
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space needs
for
article text as 3x the current parser cache utilization? More memcached usage is great, not asking to dissuade its use but because its better to capacity plan than to react.
action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage.
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are two cases in which article content gets loaded by the API:
1) Going directly to a page (eg clicking a link from a Google search) will result in the backend serving a page with ONLY summary section content and section headers. The rest of the page is lazily loaded via API request once the JS for the page gets loaded. The idea is to increase responsiveness by reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like: http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
2) Loading an article entirely via Javascript - like when a link is clicked in an article to another article, or an article is loaded via search. This will make ONE call to the API to load article content. API request looks like: http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these API requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we need a way to differentiate the types of API requests being made when they otherwise share the same URL.
On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com wrote:
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one).
Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date.
In summary the reason is to
- make the app feel more responsive by simply loading content rather
than reloading the entire interface 2) reducing the payload sent to a device.
Session Tracking ################
Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views.
As for the situations where an entire page is loaded via the api it makes no difference to us to whether we
- send the same header (set via javascript) or
- add a query string parameter.
The only advantage I can see of using a header is that an initial page load of the article San Francisco currently uses the same api url as a page load of the article San Francisco via javascript (e.g. I click a link to 'San Francisco' on the California article).
In this new method they would use different urls (as the data sent is different). I'm not sure how that would effect caching.
Let us know which method is preferred. From my perspective implementation of either is easy.
[1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections
On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeldman@wikimedia.org wrote:
Max - good answers re: caching concerns. That leaves studying if the
bytes
transferred on average mobile article view increases or decreases with
lazy
section loading. If it increases, I'd say this isn't a positive
direction
to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview, and
the
impact on backend apache utilization which I'd expect to be > 0.
Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis.
On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.wiki@gmail.com
wrote:
On 11.02.2013, 22:11 Asher wrote:
And then I'd wonder about the server side implementation. How will
frontend
cache invalidation work? Are we going to need to purge every
individual
article section relative to /w/api.php on edit?
Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs.
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space
needs
for
article text as 3x the current parser cache utilization? More
memcached
usage is great, not asking to dissuade its use but because its better
to
capacity plan than to react.
action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage.
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Jon Robson http://jonrobson.me.uk @rakugojon
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thanks for the clarification Arthur, that clears up some misconceptions I had. I saw a demo around the allstaff where individual sections were lazy loaded, so I think I had that in my head.
It does still seem to me that the data to determine secondary api requests should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per case 1 below. Otherwise, it's a pageview as per case 2. Difficult or expensive to reconcile? Not when you're doing distributed log analysis via hadoop.
On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards arichards@wikimedia.orgwrote:
Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are two cases in which article content gets loaded by the API:
- Going directly to a page (eg clicking a link from a Google search) will
result in the backend serving a page with ONLY summary section content and section headers. The rest of the page is lazily loaded via API request once the JS for the page gets loaded. The idea is to increase responsiveness by reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like:
http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
- Loading an article entirely via Javascript - like when a link is clicked
in an article to another article, or an article is loaded via search. This will make ONE call to the API to load article content. API request looks like:
http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these API requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we need a way to differentiate the types of API requests being made when they otherwise share the same URL.
On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com wrote:
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one).
Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date.
In summary the reason is to
- make the app feel more responsive by simply loading content rather
than reloading the entire interface 2) reducing the payload sent to a device.
Session Tracking ################
Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views.
As for the situations where an entire page is loaded via the api it makes no difference to us to whether we
- send the same header (set via javascript) or
- add a query string parameter.
The only advantage I can see of using a header is that an initial page load of the article San Francisco currently uses the same api url as a page load of the article San Francisco via javascript (e.g. I click a link to 'San Francisco' on the California article).
In this new method they would use different urls (as the data sent is different). I'm not sure how that would effect caching.
Let us know which method is preferred. From my perspective implementation of either is easy.
[1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections
On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman afeldman@wikimedia.org wrote:
Max - good answers re: caching concerns. That leaves studying if the
bytes
transferred on average mobile article view increases or decreases with
lazy
section loading. If it increases, I'd say this isn't a positive
direction
to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview, and
the
impact on backend apache utilization which I'd expect to be > 0.
Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis.
On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik maxsem.wiki@gmail.com
wrote:
On 11.02.2013, 22:11 Asher wrote:
And then I'd wonder about the server side implementation. How will
frontend
cache invalidation work? Are we going to need to purge every
individual
article section relative to /w/api.php on edit?
Since the API doesn't require pretty URLs, we could simply append the current revision ID to the mobileview URLs.
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now individual sections in memcached? If so, should we calculate memcached space
needs
for
article text as 3x the current parser cache utilization? More
memcached
usage is great, not asking to dissuade its use but because its
better
to
capacity plan than to react.
action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will result in increased memcached usage.
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Jon Robson http://jonrobson.me.uk @rakugojon
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
It does still seem to me that the data to determine secondary api requests should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per case 1 below. Otherwise, it's a pageview as per case 2. Difficult or expensive to reconcile? Not when you're doing distributed log analysis via hadoop.
So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D
On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards <arichards@wikimedia.org
wrote:
Thanks, Jon. To try and clarify a bit more about the API requests... they are not made on a per-section basis. As I mentioned earlier, there are
two
cases in which article content gets loaded by the API:
- Going directly to a page (eg clicking a link from a Google search)
will
result in the backend serving a page with ONLY summary section content
and
section headers. The rest of the page is lazily loaded via API request
once
the JS for the page gets loaded. The idea is to increase responsiveness
by
reducing the delay for an article to load (further details in the article Jon previously linked to). The API request looks like:
http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
- Loading an article entirely via Javascript - like when a link is
clicked
in an article to another article, or an article is loaded via search.
This
will make ONE call to the API to load article content. API request looks like:
http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted as a 'pageview'. You could make the argument that we just count all of these
API
requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need to be able to count the traditional page request as a pageview - thus we
need
a way to differentiate the types of API requests being made when they otherwise share the same URL.
On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com wrote:
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one).
Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of lazy loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date.
In summary the reason is to
- make the app feel more responsive by simply loading content rather
than reloading the entire interface 2) reducing the payload sent to a device.
Session Tracking ################
Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta or stable works fine for standard page views.
As for the situations where an entire page is loaded via the api it makes no difference to us to whether we
- send the same header (set via javascript) or
- add a query string parameter.
The only advantage I can see of using a header is that an initial page load of the article San Francisco currently uses the same api url as a page load of the article San Francisco via javascript (e.g. I click a link to 'San Francisco' on the California article).
In this new method they would use different urls (as the data sent is different). I'm not sure how that would effect caching.
Let us know which method is preferred. From my perspective implementation of either is easy.
[1] http://www.mediawiki.org/wiki/MobileFrontend/Dynamic_Sections
On Mon, Feb 11, 2013 at 12:50 PM, Asher Feldman <
afeldman@wikimedia.org>
wrote:
Max - good answers re: caching concerns. That leaves studying if the
bytes
transferred on average mobile article view increases or decreases
with
lazy
section loading. If it increases, I'd say this isn't a positive
direction
to go in and stop there. If it decreases, then we should look at the effect on total latency, number of requests required per pageview,
and
the
impact on backend apache utilization which I'd expect to be > 0.
Does the mobile team have specific goals that this project aims to accomplish? If so, we can use those as the measure against which to compare an impact analysis.
On Mon, Feb 11, 2013 at 12:21 PM, Max Semenik <maxsem.wiki@gmail.com
wrote:
On 11.02.2013, 22:11 Asher wrote:
And then I'd wonder about the server side implementation. How will
frontend
cache invalidation work? Are we going to need to purge every
individual
article section relative to /w/api.php on edit?
Since the API doesn't require pretty URLs, we could simply append
the
current revision ID to the mobileview URLs.
Article HTML in memcached (parser cache), mobile processed HTML in memcached.. Now
individual
sections in memcached? If so, should we calculate memcached space
needs
for
article text as 3x the current parser cache utilization? More
memcached
usage is great, not asking to dissuade its use but because its
better
to
capacity plan than to react.
action=mobileview caches pages only in full and serves only sections requested, so no changes in request patterns will
result
in increased memcached usage.
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Jon Robson http://jonrobson.me.uk @rakugojon
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Arthur Richards Software Engineer, Mobile [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tuesday, February 12, 2013, Diederik van Liere wrote:
It does still seem to me that the data to determine secondary api
requests
should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per
case
1 below. Otherwise, it's a pageview as per case 2. Difficult or
expensive
to reconcile? Not when you're doing distributed log analysis via hadoop.
So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D
If you can point me to some examples, I'll see if I can find any insights into the behavior.
On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards <
arichards@wikimedia.org
wrote:
Thanks, Jon. To try and clarify a bit more about the API requests...
they
are not made on a per-section basis. As I mentioned earlier, there are
two
cases in which article content gets loaded by the API:
- Going directly to a page (eg clicking a link from a Google search)
will
result in the backend serving a page with ONLY summary section content
and
section headers. The rest of the page is lazily loaded via API request
once
the JS for the page gets loaded. The idea is to increase responsiveness
by
reducing the delay for an article to load (further details in the
article
Jon previously linked to). The API request looks like:
http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
- Loading an article entirely via Javascript - like when a link is
clicked
in an article to another article, or an article is loaded via search.
This
will make ONE call to the API to load article content. API request
looks
like:
http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted
as a
'pageview'. You could make the argument that we just count all of these
API
requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we need
to
be able to count the traditional page request as a pageview - thus we
need
a way to differentiate the types of API requests being made when they otherwise share the same URL.
On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com
wrote:
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing so on this one).
Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of
lazy
loading section content and subsequent pages can be found here [1], I just gave it a refresh as it was a little out of date.
In summary the reason is to
- make the app feel more responsive by simply loading content rather
than reloading the entire interface 2) reducing the payload sent to a device.
Session Tracking ################
Going back to the discussion of tracking mobile page views, it sounds like a header stating whether a page is being viewed in alpha, beta
or
stable works fine for standard page views.
As for the situations where an entire page is loaded via the api it makes no dif
Just to tie this thread up - the issue of how to count ajax driven pageviews loaded from the api and of how to differentiate those requests from secondary api page requests has been resolved without the need for code or logging changes.
Tagging of the mobile beta site will be accomplished via a new generic mediawiki http response header dedicated to logging containing key value pairs.
-Asher
On Tue, Feb 12, 2013 at 9:56 AM, Asher Feldman afeldman@wikimedia.orgwrote:
On Tuesday, February 12, 2013, Diederik van Liere wrote:
It does still seem to me that the data to determine secondary api
requests
should already be present in the existing log line. If the value of the page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per
case
1 below. Otherwise, it's a pageview as per case 2. Difficult or
expensive
to reconcile? Not when you're doing distributed log analysis via
hadoop.
So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D
If you can point me to some examples, I'll see if I can find any insights into the behavior.
On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards <
arichards@wikimedia.org
wrote:
Thanks, Jon. To try and clarify a bit more about the API requests...
they
are not made on a per-section basis. As I mentioned earlier, there are
two
cases in which article content gets loaded by the API:
- Going directly to a page (eg clicking a link from a Google search)
will
result in the backend serving a page with ONLY summary section content
and
section headers. The rest of the page is lazily loaded via API request
once
the JS for the page gets loaded. The idea is to increase
responsiveness
by
reducing the delay for an article to load (further details in the
article
Jon previously linked to). The API request looks like:
http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
- Loading an article entirely via Javascript - like when a link is
clicked
in an article to another article, or an article is loaded via search.
This
will make ONE call to the API to load article content. API request
looks
like:
http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted
as a
'pageview'. You could make the argument that we just count all of
these
API
requests as pageviews, but there are cases when we can't load article content from the API (like devices that do not support JS), so we
need to
be able to count the traditional page request as a pageview - thus we
need
a way to differentiate the types of API requests being made when they otherwise share the same URL.
On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com
wrote:
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this further let's start another thread as I'm getting extremely confused doing
so
on this one).
Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of
lazy
loading section content and subsequent pages can be found here [1],
I
just gave it a refresh as it was a little out of date.
In summary the reason is to
- make the app feel more responsive by simply loading content
rather
than reloading the entire interface 2) reducing the payload sent to a device.
Session Tracking ################
Going back to the discussion of tracking mobile page views, it
sounds
like a header stating whether a page is being viewed in alpha, beta
or
stable works fine for standard page views.
As for the situations where an entire page is loaded via the api it makes no dif
Thanks Asher for tying this up! I was about to write a similar email :) One final question, just to make sure we are all on the same page: is the X-CS field becoming a generic key/value pair for tracking purposes?
D
On Fri, Feb 15, 2013 at 11:16 AM, Asher Feldman afeldman@wikimedia.orgwrote:
Just to tie this thread up - the issue of how to count ajax driven pageviews loaded from the api and of how to differentiate those requests from secondary api page requests has been resolved without the need for code or logging changes.
Tagging of the mobile beta site will be accomplished via a new generic mediawiki http response header dedicated to logging containing key value pairs.
-Asher
On Tue, Feb 12, 2013 at 9:56 AM, Asher Feldman <afeldman@wikimedia.org
wrote:
On Tuesday, February 12, 2013, Diederik van Liere wrote:
It does still seem to me that the data to determine secondary api
requests
should already be present in the existing log line. If the value of
the
page param in an action=mobileview api request matches the page in the referrer (perhaps with normalization), it's a secondary request as per
case
1 below. Otherwise, it's a pageview as per case 2. Difficult or
expensive
to reconcile? Not when you're doing distributed log analysis via
hadoop.
So I did look into this prior to writing the RFC and the issue is that a lot of API referrers don't contain the querystring. I don't know what triggers this so if we can fix this then we can definitely derive the secondary pageview request from the referrer field. D
If you can point me to some examples, I'll see if I can find any insights into the behavior.
On Mon, Feb 11, 2013 at 7:11 PM, Arthur Richards <
arichards@wikimedia.org
wrote:
Thanks, Jon. To try and clarify a bit more about the API requests...
they
are not made on a per-section basis. As I mentioned earlier, there
are
two
cases in which article content gets loaded by the API:
- Going directly to a page (eg clicking a link from a Google
search)
will
result in the backend serving a page with ONLY summary section
content
and
section headers. The rest of the page is lazily loaded via API
request
once
the JS for the page gets loaded. The idea is to increase
responsiveness
by
reducing the delay for an article to load (further details in the
article
Jon previously linked to). The API request looks like:
http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
- Loading an article entirely via Javascript - like when a link is
clicked
in an article to another article, or an article is loaded via
search.
This
will make ONE call to the API to load article content. API request
looks
like:
http://en.m.wikipedia.org/w/api.php?format=json&action=mobileview&pa...
These API requests are identical, but only #2 should be counted as a 'pageview' - #1 is a secondary API request and should not be counted
as a
'pageview'. You could make the argument that we just count all of
these
API
requests as pageviews, but there are cases when we can't load
article
content from the API (like devices that do not support JS), so we
need to
be able to count the traditional page request as a pageview - thus
we
need
a way to differentiate the types of API requests being made when
they
otherwise share the same URL.
On Mon, Feb 11, 2013 at 6:42 PM, Jon Robson jdlrobson@gmail.com
wrote:
I'm a bit worried that now we are asking why pages are lazy loaded rather than focusing on the fact that they currently __are doing this___ and how we can log these (if we want to discuss this
further
let's start another thread as I'm getting extremely confused doing
so
on this one).
Lazy loading sections ################ For motivation behind moving MobileFrontend into the direction of
lazy
loading section content and subsequent pages can be found here
[1],
I
just gave it a refresh as it was a little out of date.
In summary the reason is to
- make the app feel more responsive by simply loading content
rather
than reloading the entire interface 2) reducing the payload sent to a device.
Session Tracking ################
Going back to the discussion of tracking mobile page views, it
sounds
like a header stating whether a page is being viewed in alpha,
beta
or
stable works fine for standard page views.
As for the situations where an entire page is loaded via the api
it
makes no dif
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Mon, Feb 4, 2013 at 4:24 PM, Arthur Richards arichards@wikimedia.orgwrote:
Asher, I understand your hesitation about using HTTP header fields, but there are a couple problems I'm seeing with using query string parameters. Perhaps you or others have some ideas how to get around these:
- We should keep user-facing URLs canonical as much as possible (primarily
for link sharing) ** If we keep user-facing URLs canonical, we could potentially add query string params via javascript, but that would only work on devices that support javascript/have javascript enabled (this might not be a huge deal as we are planning changes such that users that do not support jQuery will get a simplified version of the stable site)
I was thinking of this as a solution for the X-MF-Req header, based on your explanation of it earlier in the the thread: "Almost correct - I realize I didn't actually explain it correctly. This would be a request HTTP header set by the client in API requests made by Javascript provided by MobileFrontend."
I only meant to apply the query string idea to API requests, which can also be marked to indicate non-standard versions of the site. I completely missed the case of non-api requests about which beta/alpha usage data needs to be collected. What about doing so via the eventlog service? Only for users actually opted into one of these programs, no need to log anything special for the majority of users getting the standard site.
* How could this work for the first pageview request (eg a user clicking a
link from Google or even just browsing to http://en.wikipedia.org)?
I think this is covered by the above, in that the data intended to go into x-mf-req doesn't apply to this sort of page view, and first views from users opted into a trial can eventlog the trial usage.
the reason to use two fields instead of one makes it much easier to implement or performant?
On Sun, Feb 3, 2013 at 1:08 AM, David Schoonover dsc@wikimedia.org wrote:
Huh! News to me as well. I definitely agree with that decision. Thanks, Ori!
I've already written the Varnish code for setting X-MF-Mode so it can be captured by varnishncsa. Is there agreement to switch to Mobile-Mode, or at least, MF-Mode?
Looking especially to hear from Arthur and Matt.
-- David Schoonover dsc@wikimedia.org
On Sat, Feb 2, 2013 at 2:16 PM, Diederik van Liere dvanliere@wikimedia.orgwrote:
Thanks Ori, I was not aware of this D
Sent from my iPhone
On 2013-02-02, at 16:55, Ori Livneh ori@wikimedia.org wrote:
On Saturday, February 2, 2013 at 1:36 PM, Platonides wrote:
I don't like it's cryptic nature.
Someone looking at the headers sent to his browser would be very confused about what's the point of «X-MF-Mode: b».
Instead something like this would be much more descriptive: X-Mobile-Mode: stable X-Mobile-Request: secondary
But that also means sending more bytes through the wire :S
Well, you can (and should) drop the 'X-' :-)
See http://tools.ietf.org/html/rfc6648: Deprecating the "X-" Prefix and
Similar Constructs in Application Protocols
-- Ori Livneh
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org