+ mobile-l

Here's a rough summary of the discussion based on my understanding:

Problem and background:
While most parameters we pass to the PHP API action=mobileview endpoint are constant, there are a couple of parameters which depend either on device dimensions or on user preferences. 

The questions revolve around trading off caching of requests/trying to avoid too much variance of requests vs. processing on clients when we move to RESTBase services for page content. We want to be able to take advantage of caching on both the edge cache side (Varnish) and also on the server-side (RESTBase stores the results of each page revision) as well. 

In the first phase of using RESTBase it won't pre-generate the results when a new page revision gets created. Instead, it would generate and save the results on-demand. In a later phase we aim to get pre-generation enabled.

1) leadImageWidth: The Android app provides the desired lead image width and passes that to the mobileview action API as "thumbsize".[1] The Android app provides only one of three possible values: 640, 800, 1024.[3]. It only uses the the URL for the lead image, not the dimensions since it gets them when the actual image finished downloading. The iOS app currently uses "thumbwidth" which is somewhat similar to "thumbsize" but has its own pros and cons.[4]

2) noimages: In the Android app settings, the user can chose to not show any images. (The iOS app doesn't have this setting.) When this is the case we add a noimages=true query parameter to the PHP mobileview request.[1] Then the payload replaces the <img> tags with <span> tags. BTW, if the client specified noimages=true then the value of leadImageWidth does not matter; in fact, then we could omit the whole lead image info from the result as well.
It's unclear to me which percentage of users actually use this setting.

Possible solution alternatives:
1) leadImageWidth: 
1A) If the clients uses a constant value, let's say 800px for thumbsize action=mobileview parameter then the client could replace the /800px- portion in  the resulting URL with the desired width, as long as the URL structure stays predictable[2]. If the string replacement fails we could still use the 800px URL.
1B) The new RESTBase API could provide an array of leadImage URL values to the client (instead of the thumb JSON object).

2) noimages
2A) The clients could replace the <img> tags with <span> tags, to emulate what the nomiages flag of mobileview does. This would help caching by reducing variability. OTOH this puts more burden on clients since DOM transformations is something clients want to avoid. In this case in particular since this is usually set because there are bandwidth or CPU issues on the client side.
2B) We could provide a noimages=true query parameter also with RESTBase. We could keep this uncached or implement this as a transform on the cached base version (ideally in the service).

Thoughts, comments?

Cheers,
Bernd

[1] https://en.m.wikipedia.org/w/api.php?action=mobileview&format=json&page=CERN&prop=text%7Csections%7Clanguagecount%7Cthumb%7Cimage%7Cid%7Crevision%7Cdescription%7Clastmodified%7Cnormalizedtitle%7Cdisplaytitle%7Cprotection%7Ceditable&onlyrequestedsections=1&sections=0&sectionprop=toclevel%7Cline%7Canchor&noheadings=true&noimages=true&thumbsize=800

[2] Example:
"//upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/800px-Cernfounders.png" would become "//upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/1024px-Cernfounders.png".

[3] We don't want to add arbitrary values and follow certain bucket sizes to enhance chances of cache hits and reduce burden on servers.
Width buckets: https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FMultimediaViewer.git/f9e7bae91a8032fa13fc68114a0d57d190ea77f9/resources%2Fmmv%2Fmmv.ThumbnailWidthCalculator.js#L69

[4] The Android app wanted to move to thumbwidth as well but iOS encountered issues with svg files: https://phabricator.wikimedia.org/T91144 + https://phabricator.wikimedia.org/T98528
 

On Sun, Jul 26, 2015 at 11:01 PM, Bernd Sitzmann <bernd@wikimedia.org> wrote:

Correct me if I'm wrong, but the actual JPEG / PNG of the (lead) image will not be sent together with the first response, right? If so, simply adding the JSON with the three sizes adds an overhead of 100 or so bytes, while allowing us to cache/store the response correctly.

Yes, you are correct. The actual image is downloaded in a separate request. This is just to get the URL of the lead image. Earlier I thought we would also use the dimensions provided in the JSON output, but looking at the Android code I don't see this used.
I'm now thinking that we could just provide one standard value (e.g. 800px) for the mobileview request, and then the client could just adjust the lead image URL 
Example:
"//upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/800px-Cernfounders.png" would become "//upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/1024px-Cernfounders.png".
While this seems a bit hacky by not following hypermedia principles it would also avoid the issue thumbwidth issues.[1][2]

Bernd



On Sat, Jul 25, 2015 at 4:37 PM, Gabriel Wicke <gwicke@wikimedia.org> wrote:


On Sat, Jul 25, 2015 at 5:50 AM, Marko Obrovac <mobrovac@wikimedia.org> wrote:
Correct me if I'm wrong, but the actual JPEG / PNG of the (lead) image will not be sent together with the first response, right? If so, simply adding the JSON with the three sizes adds an overhead of 100 or so bytes, while allowing us to cache/store the response correctly.

As for the options, I'd go with (1) as well. Mostly because external requests will not be POSTs, but GETs, so we would still need some magic translation in RESTBase hashing the query parameters and deducing the exact storage request. I might be wrong here as well, though.

Perhaps we should consider option (1a): RESTBase sends the request together with the HTML to mangle right away. Hm, that looks more closely to option (2) though and still needs a specialised RESTBase module.

2) should work without a special module once the post_request_storage stanza is implemented. We can point that to the main content storage bucket, and get the implicit data fetching that way.
 

Cheers,
Marko

On 24 July 2015 at 23:53, Gabriel Wicke <gwicke@wikimedia.org> wrote:


On Fri, Jul 24, 2015 at 2:39 PM, Bernd Sitzmann <bernd@wikimedia.org> wrote:
Option 1 sounds interesting to me.
Not sure I fully understand option 2. (Sounds like pre-generation to me.)

Yes, it would normally use the pre-generated content, but generate & save it on demand if needed. That's the case in both variants, though. Only difference is recursive GET back to RESTBase vs. RB POSTing the needed content directly.
 

Thanks,
Bernd

On Fri, Jul 24, 2015 at 3:22 PM, Gabriel Wicke <gwicke@wikimedia.org> wrote:


On Fri, Jul 24, 2015 at 2:05 PM, Bernd Sitzmann <bernd@wikimedia.org> wrote:
Transforms on the cached base version sounds interesting for both cases. How does that work? 

I see three main options:

1) the app service provides a GET end point and, when called with the custom parameters, fetches the base version from RESTBase & returns a patched version corresponding to the custom settings. RESTBase just proxies the custom entry point.

2) is basically the same, except that RESTBase POSTs the base version to the service. We are just starting work on T105975 which might give us a way to do this without writing a custom module.

3) is to do the post-processing in a custom RESTBase module. I'm not in favor of this unless absolutely needed, which I don't think is the case here.

 

Bernd

On Fri, Jul 24, 2015 at 2:48 PM, Gabriel Wicke <gwicke@wikimedia.org> wrote:


On Fri, Jul 24, 2015 at 1:28 PM, Bernd Sitzmann <bernd@wikimedia.org> wrote:
I tend to agree and I think we should try to take advantage of the storage & caching capabilities as much as possible. Not just on our servers but also on the edge-caches.

I'd venture a guess that the noimages flag is rarely used (<5%). Dmitry, do we have any data about the use of "Show images" preference being turned off? If not then that would be another good one for EL. I'm going out on a limb here saying that if my guess is correct then we could potentially replace the <img> tags with the respective <span> tags to emulate the noimages flag on the clients. It's not ideal since the <img> tags have a bigger payload and post-processing the payload on the clients is something we would like to avoid. It's really a tradeoff between caching and pure payload size.


We could keep this uncached or implement this as a transform on the cached base version (ideally in the service).
 

The leadImageWidth has currently three possible values: 
* 640px for phones, 
* 800px for 7" tablets/phablets, 
* 1024px for 10" tablets. 
So, it's not completely variable. We try to take the image size buckets[1] into account to help the servers with caching. Here the distribution is not so clear-cut. I'm not sure if there is a reasonable default value. But the difference in the payload would be very minor. This only affects the thumb JSON object at the top level of the JSON payload.

Examples:
640[2]:
"thumb": {
800:
1024:

So, I'm thinking before we enable to pre-generation we could drop the parameters and do something else instead, like:
Make "thumb" an (associative?) array so we have all three values always included. I'm not a big fan of it since this mean we need to deviate the parsing code between action=mobileview and RESTBase further and we have again more data in the payload than the client is actually using.

To summarize, I think we have some alternatives we could consider but they come with a price.

You could also both the old & new dimensions in the PHP response for a transition period. That way you could eventually phase out the top-level width & height. Since the urls are all the same apart from the size, you could perhaps also use something more compact like

thumb: {
640: {
w: 635,
h: 640,
  },
800: {
w: 794,
h: 800,
  }, 
1024: {
w: 1017,
h: 1024,
  }
}

or, if you really wanted to go super compact at the cost of readability:

[640,635,640,635],
[800,794,800,794],
[1024,1017,1024,1017]
]
 

Thanks,
Bernd



On Fri, Jul 24, 2015 at 11:13 AM, Gabriel Wicke <gwicke@wikimedia.org> wrote:
This does complicate the storage & caching story. We likely won't want to pre-generate all permutations for each revision, which means that request performance will be worse than stored content.

In the short term we can deploy this without storage and caching, but for the longer term we should really figure out a way to make this efficient. Could some of this processing be done on the client, perhaps by running a string replacement on HTML?

On Fri, Jul 24, 2015 at 7:27 AM, Marko Obrovac <mobrovac@wikimedia.org> wrote:
Hi Bernd,

On 24 July 2015 at 08:07, Bernd Sitzmann <bernd@wikimedia.org> wrote:
Hi Marko,

There are a couple of parameters we pass to the mobileview action which depend either on device dimensions or on user preferences.
* leadImageWidth: We calculate the desired lead image width to download on the client and pass that to the mobileview action API as "thumbsize".[1]
* noimages: The user can chose to not download any images. When this is the case we add a "noimages": true flag to the PHP.[1] Then the payload returns no <img> tags.

In the future there might be a few more. I could also see something similar to leadImageWidth, where we calculate the best size of images or videos to display.

What do you recommend to accomplish the equivalent for RESTBase endpoints?

What you are describing seems like complimentary information, so I would recommend providing them as query parameters, with the MobileApps service having some (sane) defaults in case these are missing. The public API call would then be something like: https://(en|m).wikipedia.org/api/rest_v1/page/mobile-html-full/Foobar?thumbsize=200&noimages=true .

Note that RESTBase needs the explicit list of query params and headers that can be forwarded to back-end services, so if/when you do implement this in the apps service, please notify us (phab, mail, irc, etc) or try to include them in the RESTBase config concerning MobileApps~[1] yourselves.

Cheers,
Marko


P.S. We are making really good progress on the deployment! Hope to see it live soon :)




--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation



--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation




--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation



--
Marko Obrovac, PhD
Senior Services Engineer
Wikimedia Foundation



--
Gabriel Wicke
Principal Engineer, Wikimedia Foundation