+ mobile-l
Here's a rough summary of the discussion based on my understanding:
*Problem and background:* While most parameters we pass to the PHP API action=mobileview endpoint are constant, there are a couple of parameters which depend either on device dimensions or on user preferences.
The questions revolve around trading off caching of requests/trying to avoid too much variance of requests vs. processing on clients when we move to RESTBase services for page content. We want to be able to take advantage of caching on both the edge cache side (Varnish) and also on the server-side (RESTBase stores the results of each page revision) as well.
In the first phase of using RESTBase it won't pre-generate the results when a new page revision gets created. Instead, it would generate and save the results on-demand. In a later phase we aim to get pre-generation enabled.
*1) leadImageWidth*: The Android app provides the desired lead image width and passes that to the mobileview action API as "thumbsize".[1] The Android app provides only one of three possible values: 640, 800, 1024.[3]. It only uses the the URL for the lead image, not the dimensions since it gets them when the actual image finished downloading. The iOS app currently uses "thumbwidth" which is somewhat similar to "thumbsize" but has its own pros and cons.[4]
*2) noimages*: In the Android app settings, the user can chose to not show any images. (The iOS app doesn't have this setting.) When this is the case we add a noimages=true query parameter to the PHP mobileview request.[1] Then the payload replaces the <img> tags with <span> tags. BTW, if the client specified noimages=true then the value of leadImageWidth does not matter; in fact, then we could omit the whole lead image info from the result as well. It's unclear to me which percentage of users actually use this setting.
*Possible solution alternatives:* *1) leadImageWidth: * *1A)* If the clients uses a constant value, let's say 800px for thumbsize action=mobileview parameter then the client could replace the /800px- portion in the resulting URL with the desired width, as long as the URL structure stays predictable[2]. If the string replacement fails we could still use the 800px URL. *1B)* The new RESTBase API could provide an array of leadImage URL values to the client (instead of the thumb JSON object).
*2) noimages: * *2A)* The clients could replace the <img> tags with <span> tags, to emulate what the nomiages flag of mobileview does. This would help caching by reducing variability. OTOH this puts more burden on clients since DOM transformations is something clients want to avoid. In this case in particular since this is usually set because there are bandwidth or CPU issues on the client side. *2B)* We could provide a noimages=true query parameter also with RESTBase. We could keep this uncached or implement this as a transform on the cached base version (ideally in the service).
Thoughts, comments?
Cheers, Bernd
[1] https://en.m.wikipedia.org/w/api.php?action=mobileview&format=json&p...
[2] Example: "// upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/800px-Cernfounders.png" would become "// upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/1024px-Cernfounders.png ".
[3] We don't want to add arbitrary values and follow certain bucket sizes to enhance chances of cache hits and reduce burden on servers. Width buckets: *https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FMultimediaViewer.git... https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FMultimediaViewer.git/f9e7bae91a8032fa13fc68114a0d57d190ea77f9/resources%2Fmmv%2Fmmv.ThumbnailWidthCalculator.js#L69*
[4] The Android app wanted to move to thumbwidth as well but iOS encountered issues with svg files: https://phabricator.wikimedia.org/T91144 + https://phabricator.wikimedia.org/T98528
On Sun, Jul 26, 2015 at 11:01 PM, Bernd Sitzmann bernd@wikimedia.org
wrote:
Correct me if I'm wrong, but the actual JPEG / PNG of the (lead) image
will not be sent together with the first response, right? If so, simply adding the JSON with the three sizes adds an overhead of 100 or so bytes, while allowing us to cache/store the response correctly.
Yes, you are correct. The actual image is downloaded in a separate request. This is just to get the URL of the lead image. Earlier I thought we would also use the dimensions provided in the JSON output, but looking at the Android code I don't see this used. I'm now thinking that we could just provide one standard value (e.g. 800px) for the mobileview request, and then the client could just adjust the lead image URL Example: "//upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/ *800px*-Cernfounders.png" would become "// upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/ *1024px*-Cernfounders.png". While this seems a bit hacky by not following hypermedia principles it would also avoid the issue thumbwidth issues.[1][2]
Bernd
[1] https://phabricator.wikimedia.org/T91144 [2] https://phabricator.wikimedia.org/T98528
On Sat, Jul 25, 2015 at 4:37 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
On Sat, Jul 25, 2015 at 5:50 AM, Marko Obrovac mobrovac@wikimedia.org wrote:
Correct me if I'm wrong, but the actual JPEG / PNG of the (lead) image will not be sent together with the first response, right? If so, simply adding the JSON with the three sizes adds an overhead of 100 or so bytes, while allowing us to cache/store the response correctly.
As for the options, I'd go with (1) as well. Mostly because external requests will not be POSTs, but GETs, so we would still need some magic translation in RESTBase hashing the query parameters and deducing the exact storage request. I might be wrong here as well, though.
Perhaps we should consider option (1a): RESTBase sends the request together with the HTML to mangle right away. Hm, that looks more closely to option (2) though and still needs a specialised RESTBase module.
- should work without a special module once the post_request_storage
stanza is implemented. We can point that to the main content storage bucket, and get the implicit data fetching that way.
Cheers, Marko
On 24 July 2015 at 23:53, Gabriel Wicke gwicke@wikimedia.org wrote:
On Fri, Jul 24, 2015 at 2:39 PM, Bernd Sitzmann bernd@wikimedia.org wrote:
Option 1 sounds interesting to me. Not sure I fully understand option 2. (Sounds like pre-generation to me.)
Yes, it would normally use the pre-generated content, but generate & save it on demand if needed. That's the case in both variants, though. Only difference is recursive GET back to RESTBase vs. RB POSTing the needed content directly.
Thanks, Bernd
On Fri, Jul 24, 2015 at 3:22 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
> > > On Fri, Jul 24, 2015 at 2:05 PM, Bernd Sitzmann <bernd@wikimedia.org > > wrote: > >> Transforms on the cached base version sounds interesting for both >> cases. How does that work? >> > > I see three main options: > > 1) the app service provides a GET end point and, when called with > the custom parameters, fetches the base version from RESTBase & returns a > patched version corresponding to the custom settings. RESTBase just proxies > the custom entry point. > > 2) is basically the same, except that RESTBase POSTs the base > version to the service. We are just starting work on T105975 which might > give us a way to do this without writing a custom module. > > 3) is to do the post-processing in a custom RESTBase module. I'm not > in favor of this unless absolutely needed, which I don't think is the case > here. > > > >> >> Bernd >> >> On Fri, Jul 24, 2015 at 2:48 PM, Gabriel Wicke < >> gwicke@wikimedia.org> wrote: >> >>> >>> >>> On Fri, Jul 24, 2015 at 1:28 PM, Bernd Sitzmann < >>> bernd@wikimedia.org> wrote: >>> >>>> I tend to agree and I think we should try to take advantage of >>>> the storage & caching capabilities as much as possible. Not just >>>> on our servers but also on the edge-caches. >>>> >>>> I'd venture a guess that the *noimages* flag is rarely used >>>> (<5%). Dmitry, do we have any data about the use of "Show images" >>>> preference being turned off? If not then that would be another good one for >>>> EL. I'm going out on a limb here saying that if my guess is correct then we >>>> could potentially replace the <img> tags with the respective <span> tags to >>>> emulate the noimages flag on the clients. It's not ideal since the <img> >>>> tags have a bigger payload and post-processing the payload on the clients >>>> is something we would like to avoid. It's really a tradeoff between caching >>>> and pure payload size. >>>> >>> >>> >>> We could keep this uncached or implement this as a transform on >>> the cached base version (ideally in the service). >>> >>> >>>> >>>> The *leadImageWidth* has currently three possible values: >>>> * 640px for phones, >>>> * 800px for 7" tablets/phablets, >>>> * 1024px for 10" tablets. >>>> So, it's not completely variable. We try to take the image size >>>> buckets[1] into account to help the servers with caching. Here the >>>> distribution is not so clear-cut. I'm not sure if there is a reasonable >>>> default value. But the difference in the payload would be very minor. This >>>> only affects the thumb JSON object at the top level of the JSON payload. >>>> >>>> Examples: >>>> 640[2]: >>>> "thumb": { >>>> "url": "// >>>> upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>>> ","width": 635,"height": 640}, >>>> 800: >>>> "thumb": {"url": "// >>>> upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/794px-Cernfounders.png >>>> ","width": 794,"height": 800}, >>>> 1024: >>>> "thumb": {"url": "// >>>> upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/1017px-Cernfounders.png >>>> ","width": 1017,"height": 1024}, >>>> >>>> So, I'm thinking before we enable to pre-generation we could drop >>>> the parameters and do something else instead, like: >>>> Make "thumb" an (associative?) array so we have all three values >>>> always included. I'm not a big fan of it since this mean we need to deviate >>>> the parsing code between action=mobileview and RESTBase further and we have >>>> again more data in the payload than the client is actually using. >>>> >>>> To summarize, I think we have some alternatives we could consider >>>> but they come with a price. >>>> >>> >>> You could also both the old & new dimensions in the PHP response >>> for a transition period. That way you could eventually phase out the >>> top-level width & height. Since the urls are all the same apart from the >>> size, you could perhaps also use something more compact like >>> >>> thumb: { >>> baseURL: "// >>> upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/ >>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>> ", >>> 640: { >>> w: 635, >>> h: 640, >>> url: "635px-Cernfounders.png >>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>> " >>> }, >>> 800: { >>> w: 794, >>> h: 800, >>> url: "794px-Cernfounders.png >>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>> " >>> }, >>> 1024: { >>> w: 1017, >>> h: 1024, >>> url: "1017px-Cernfounders.png >>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>> " >>> } >>> } >>> >>> or, if you really wanted to go super compact at the cost of >>> readability: >>> >>> ["// >>> upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/ >>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>> {size}px-Cernfounders.png >>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>> ", >>> [640,635,640,635], >>> [800,794,800,794], >>> [1024,1017,1024,1017] >>> ] >>> >>> >>>> >>>> Thanks, >>>> Bernd >>>> >>>> [1] >>>> https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FMultimediaViewer.git... >>>> [2] >>>> https://en.m.wikipedia.org/w/api.php?action=mobileview&format=json&p... >>>> *size*=640 >>>> >>>> >>>> On Fri, Jul 24, 2015 at 11:13 AM, Gabriel Wicke < >>>> gwicke@wikimedia.org> wrote: >>>> >>>>> This does complicate the storage & caching story. We likely >>>>> won't want to pre-generate all permutations for each revision, which means >>>>> that request performance will be worse than stored content. >>>>> >>>>> In the short term we can deploy this without storage and >>>>> caching, but for the longer term we should really figure out a way to make >>>>> this efficient. Could some of this processing be done on the client, >>>>> perhaps by running a string replacement on HTML? >>>>> >>>>> On Fri, Jul 24, 2015 at 7:27 AM, Marko Obrovac < >>>>> mobrovac@wikimedia.org> wrote: >>>>> >>>>>> Hi Bernd, >>>>>> >>>>>> On 24 July 2015 at 08:07, Bernd Sitzmann bernd@wikimedia.org >>>>>> wrote: >>>>>> >>>>>>> Hi Marko, >>>>>>> >>>>>>> There are a couple of parameters we pass to the mobileview >>>>>>> action which depend either on device dimensions or on user preferences. >>>>>>> * leadImageWidth: We calculate the desired lead image width to >>>>>>> download on the client and pass that to the mobileview action API as >>>>>>> "thumbsize".[1] >>>>>>> * noimages: The user can chose to not download any images. >>>>>>> When this is the case we add a "noimages": true flag to the PHP.[1] Then >>>>>>> the payload returns no <img> tags. >>>>>>> >>>>>>> In the future there might be a few more. I could also see >>>>>>> something similar to leadImageWidth, where we calculate the best size of >>>>>>> images or videos to display. >>>>>>> >>>>>>> What do you recommend to accomplish the equivalent for >>>>>>> RESTBase endpoints? >>>>>>> >>>>>> >>>>>> What you are describing seems like complimentary information, >>>>>> so I would recommend providing them as query parameters, with the >>>>>> MobileApps service having some (sane) defaults in case these are missing. >>>>>> The public API call would then be something like: https:// >>>>>> (en|m). >>>>>> wikipedia.org/api/rest_v1/page/mobile-html-full/Foobar?thumbsize=200&noimages=true >>>>>> . >>>>>> >>>>>> Note that RESTBase needs the explicit list of query params and >>>>>> headers that can be forwarded to back-end services, so if/when you do >>>>>> implement this in the apps service, please notify us (phab, mail, irc, etc) >>>>>> or try to include them in the RESTBase config concerning MobileApps~[1] >>>>>> yourselves. >>>>>> >>>>>> Cheers, >>>>>> Marko >>>>>> >>>>>> [1] >>>>>> https://github.com/wikimedia/restbase/blob/master/specs/mediawiki/v1/mobilea... >>>>>> >>>>>> P.S. We are making really good progress on the deployment! Hope >>>>>> to see it live soon :) >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Bernd >>>>>>> >>>>>>> [1] >>>>>>> https://en.m.wikipedia.org/w/api.php?action=mobileview&format=json&p... >>>>>>> *noimages=true&thumbsize=640* >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Marko Obrovac, PhD >>>>>> Senior Services Engineer >>>>>> Wikimedia Foundation >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Gabriel Wicke >>>>> Principal Engineer, Wikimedia Foundation >>>>> >>>> >>>> >>> >>> >>> -- >>> Gabriel Wicke >>> Principal Engineer, Wikimedia Foundation >>> >> >> > > > -- > Gabriel Wicke > Principal Engineer, Wikimedia Foundation >
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
-- Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Some comments inline: TL;DR; proper dynamic image URLs would solve this problem nicely.
I'm still curious to see if constructing image URLs on the client side would get the job done until our servers are able to fall back gracefully. The logic would be:
- API requests return image filenames instead of thumbnail URLs - Clients create the URL to the thumbnail they want at desired resolution (bounded by original image size): - File-name.extension/Npx-File-name.extension - Image/thumbnail resolution is no longer used as a request parameter - Requests are cached based on non-image-related inputs: - page title - search query/continue - etc. - Responses are given etag headers so clients don't redundantly request them until the revision changes (or user forces a refresh manually)
We might not even need to do a separate request to get original image resolutions, as last I checked, they're included as Parsoid DOM attributes: data-file-{width,height}
On Tue, Jul 28, 2015 at 6:19 PM, Bernd Sitzmann bernd@wikimedia.org wrote:
- mobile-l
Here's a rough summary of the discussion based on my understanding:
*Problem and background:* While most parameters we pass to the PHP API action=mobileview endpoint are constant, there are a couple of parameters which depend either on device dimensions or on user preferences.
The questions revolve around trading off caching of requests/trying to avoid too much variance of requests vs. processing on clients when we move to RESTBase services for page content. We want to be able to take advantage of caching on both the edge cache side (Varnish) and also on the server-side (RESTBase stores the results of each page revision) as well.
In the first phase of using RESTBase it won't pre-generate the results when a new page revision gets created. Instead, it would generate and save the results on-demand.
In a later phase we aim to get pre-generation enabled.
^ Wouldn't you then end up pre-generating revisions of pages that might not ever be accessed? It's only a cost you pay once.
*1) leadImageWidth*: The Android app provides the desired lead image width and passes that to the mobileview action API as "thumbsize".[1] The Android app provides only one of three possible values: 640, 800, 1024.[3]. It only uses the the URL for the lead image, not the dimensions since it gets them when the actual image finished downloading. The iOS app currently uses "thumbwidth" which is somewhat similar to "thumbsize" but has its own pros and cons.[4]
What exactly are we trying to optimize for here, response time of our mobileview replacement, or edge cache efficiency? In either case, the main problem here is fragmentation on image resolutions, which to me, indicates the need for dynamic image URLs.
*2) noimages*: In the Android app settings, the user can chose to not show any images. (The iOS app doesn't have this setting.) When this is the case we add a noimages=true query parameter to the PHP mobileview request.[1] Then the payload replaces the <img> tags with <span> tags. BTW, if the client specified noimages=true then the value of leadImageWidth does not matter; in fact, then we could omit the whole lead image info from the result as well. It's unclear to me which percentage of users actually use this setting.
*Possible solution alternatives:* *1) leadImageWidth: * *1A)* If the clients uses a constant value, let's say 800px for thumbsize action=mobileview parameter then the client could replace the /800px- portion in the resulting URL with the desired width, as long as the URL structure stays predictable[2]. If the string replacement fails we could still use the 800px
This is unreliable, as you could end up getting 400 (bad request) responses from the thumbnail server if the original is larger than 800 (or whatever arbitrary size we pick). Better to return the file page title (My-Image.jpg) and construct the thumbnail URL from that (My-Image.jpg/440px-My-Image.jpg).
URL. *1B)* The new RESTBase API could provide an array of leadImage URL values to the client (instead of the thumb JSON object).
Not bad, but still potentially unnecessary and hard to code for. We'll have to write (redundant) code in each client to grab the "best" image from the given thumbnails for a given situation. I would lobby for my suggestion above to let clients construct URLs, and agree across clients to use similar resolutions so we don't upset ops.
*2) noimages: * *2A)* The clients could replace the <img> tags with <span> tags, to emulate what the nomiages flag of mobileview does. This would help caching by reducing variability. OTOH this puts more burden on clients since DOM transformations is something clients want to avoid. In this case in particular since this is usually set because there are bandwidth or CPU issues on the client side. *2B)* We could provide a noimages=true query parameter also with RESTBase. We could keep this uncached or implement this as a transform on the cached base version (ideally in the service).
Thoughts, comments?
Cheers, Bernd
[1] https://en.m.wikipedia.org/w/api.php?action=mobileview&format=json&p...
[2] Example: "// upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/800px-Cernfounders.png" would become "// upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/1024px-Cernfounders.png ".
[3] We don't want to add arbitrary values and follow certain bucket sizes to enhance chances of cache hits and reduce burden on servers. Width buckets: *https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FMultimediaViewer.git... https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FMultimediaViewer.git/f9e7bae91a8032fa13fc68114a0d57d190ea77f9/resources%2Fmmv%2Fmmv.ThumbnailWidthCalculator.js#L69*
[4] The Android app wanted to move to thumbwidth as well but iOS encountered issues with svg files: https://phabricator.wikimedia.org/T91144 + https://phabricator.wikimedia.org/T98528
On Sun, Jul 26, 2015 at 11:01 PM, Bernd Sitzmann bernd@wikimedia.org
wrote:
Correct me if I'm wrong, but the actual JPEG / PNG of the (lead) image
will not be sent together with the first response, right? If so, simply adding the JSON with the three sizes adds an overhead of 100 or so bytes, while allowing us to cache/store the response correctly.
Yes, you are correct. The actual image is downloaded in a separate request. This is just to get the URL of the lead image. Earlier I thought we would also use the dimensions provided in the JSON output, but looking at the Android code I don't see this used. I'm now thinking that we could just provide one standard value (e.g. 800px) for the mobileview request, and then the client could just adjust the lead image URL Example: "//upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/ *800px*-Cernfounders.png" would become "// upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/ *1024px*-Cernfounders.png". While this seems a bit hacky by not following hypermedia principles it would also avoid the issue thumbwidth issues.[1][2]
Bernd
[1] https://phabricator.wikimedia.org/T91144 [2] https://phabricator.wikimedia.org/T98528
On Sat, Jul 25, 2015 at 4:37 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
On Sat, Jul 25, 2015 at 5:50 AM, Marko Obrovac mobrovac@wikimedia.org wrote:
Correct me if I'm wrong, but the actual JPEG / PNG of the (lead) image will not be sent together with the first response, right? If so, simply adding the JSON with the three sizes adds an overhead of 100 or so bytes, while allowing us to cache/store the response correctly.
As for the options, I'd go with (1) as well. Mostly because external requests will not be POSTs, but GETs, so we would still need some magic translation in RESTBase hashing the query parameters and deducing the exact storage request. I might be wrong here as well, though.
Perhaps we should consider option (1a): RESTBase sends the request together with the HTML to mangle right away. Hm, that looks more closely to option (2) though and still needs a specialised RESTBase module.
- should work without a special module once the post_request_storage
stanza is implemented. We can point that to the main content storage bucket, and get the implicit data fetching that way.
Cheers, Marko
On 24 July 2015 at 23:53, Gabriel Wicke gwicke@wikimedia.org wrote:
On Fri, Jul 24, 2015 at 2:39 PM, Bernd Sitzmann bernd@wikimedia.org wrote:
> Option 1 sounds interesting to me. > Not sure I fully understand option 2. (Sounds like pre-generation to > me.) >
Yes, it would normally use the pre-generated content, but generate & save it on demand if needed. That's the case in both variants, though. Only difference is recursive GET back to RESTBase vs. RB POSTing the needed content directly.
> > Thanks, > Bernd > > On Fri, Jul 24, 2015 at 3:22 PM, Gabriel Wicke <gwicke@wikimedia.org > > wrote: > >> >> >> On Fri, Jul 24, 2015 at 2:05 PM, Bernd Sitzmann < >> bernd@wikimedia.org> wrote: >> >>> Transforms on the cached base version sounds interesting for both >>> cases. How does that work? >>> >> >> I see three main options: >> >> 1) the app service provides a GET end point and, when called with >> the custom parameters, fetches the base version from RESTBase & returns a >> patched version corresponding to the custom settings. RESTBase just proxies >> the custom entry point. >> >> 2) is basically the same, except that RESTBase POSTs the base >> version to the service. We are just starting work on T105975 which might >> give us a way to do this without writing a custom module. >> >> 3) is to do the post-processing in a custom RESTBase module. I'm >> not in favor of this unless absolutely needed, which I don't think is the >> case here. >> >> >> >>> >>> Bernd >>> >>> On Fri, Jul 24, 2015 at 2:48 PM, Gabriel Wicke < >>> gwicke@wikimedia.org> wrote: >>> >>>> >>>> >>>> On Fri, Jul 24, 2015 at 1:28 PM, Bernd Sitzmann < >>>> bernd@wikimedia.org> wrote: >>>> >>>>> I tend to agree and I think we should try to take advantage of >>>>> the storage & caching capabilities as much as possible. Not >>>>> just on our servers but also on the edge-caches. >>>>> >>>>> I'd venture a guess that the *noimages* flag is rarely used >>>>> (<5%). Dmitry, do we have any data about the use of "Show images" >>>>> preference being turned off? If not then that would be another good one for >>>>> EL. I'm going out on a limb here saying that if my guess is correct then we >>>>> could potentially replace the <img> tags with the respective <span> tags to >>>>> emulate the noimages flag on the clients. It's not ideal since the <img> >>>>> tags have a bigger payload and post-processing the payload on the clients >>>>> is something we would like to avoid. It's really a tradeoff between caching >>>>> and pure payload size. >>>>> >>>> >>>> >>>> We could keep this uncached or implement this as a transform on >>>> the cached base version (ideally in the service). >>>> >>>> >>>>> >>>>> The *leadImageWidth* has currently three possible values: >>>>> * 640px for phones, >>>>> * 800px for 7" tablets/phablets, >>>>> * 1024px for 10" tablets. >>>>> So, it's not completely variable. We try to take the image size >>>>> buckets[1] into account to help the servers with caching. Here the >>>>> distribution is not so clear-cut. I'm not sure if there is a reasonable >>>>> default value. But the difference in the payload would be very minor. This >>>>> only affects the thumb JSON object at the top level of the JSON payload. >>>>> >>>>> Examples: >>>>> 640[2]: >>>>> "thumb": { >>>>> "url": "// >>>>> upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>>>> ","width": 635,"height": 640}, >>>>> 800: >>>>> "thumb": {"url": "// >>>>> upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/794px-Cernfounders.png >>>>> ","width": 794,"height": 800}, >>>>> 1024: >>>>> "thumb": {"url": "// >>>>> upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/1017px-Cernfounders.png >>>>> ","width": 1017,"height": 1024}, >>>>> >>>>> So, I'm thinking before we enable to pre-generation we could >>>>> drop the parameters and do something else instead, like: >>>>> Make "thumb" an (associative?) array so we have all three values >>>>> always included. I'm not a big fan of it since this mean we need to deviate >>>>> the parsing code between action=mobileview and RESTBase further and we have >>>>> again more data in the payload than the client is actually using. >>>>> >>>>> To summarize, I think we have some alternatives we could >>>>> consider but they come with a price. >>>>> >>>> >>>> You could also both the old & new dimensions in the PHP response >>>> for a transition period. That way you could eventually phase out the >>>> top-level width & height. Since the urls are all the same apart from the >>>> size, you could perhaps also use something more compact like >>>> >>>> thumb: { >>>> baseURL: "// >>>> upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/ >>>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>>> ", >>>> 640: { >>>> w: 635, >>>> h: 640, >>>> url: "635px-Cernfounders.png >>>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>>> " >>>> }, >>>> 800: { >>>> w: 794, >>>> h: 800, >>>> url: "794px-Cernfounders.png >>>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>>> " >>>> }, >>>> 1024: { >>>> w: 1017, >>>> h: 1024, >>>> url: "1017px-Cernfounders.png >>>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>>> " >>>> } >>>> } >>>> >>>> or, if you really wanted to go super compact at the cost of >>>> readability: >>>> >>>> ["// >>>> upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/ >>>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>>> {size}px-Cernfounders.png >>>> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.png/635px-Cernfounders.png >>>> ", >>>> [640,635,640,635], >>>> [800,794,800,794], >>>> [1024,1017,1024,1017] >>>> ] >>>> >>>> >>>>> >>>>> Thanks, >>>>> Bernd >>>>> >>>>> [1] >>>>> https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FMultimediaViewer.git... >>>>> [2] >>>>> https://en.m.wikipedia.org/w/api.php?action=mobileview&format=json&p... >>>>> *size*=640 >>>>> >>>>> >>>>> On Fri, Jul 24, 2015 at 11:13 AM, Gabriel Wicke < >>>>> gwicke@wikimedia.org> wrote: >>>>> >>>>>> This does complicate the storage & caching story. We likely >>>>>> won't want to pre-generate all permutations for each revision, which means >>>>>> that request performance will be worse than stored content. >>>>>> >>>>>> In the short term we can deploy this without storage and >>>>>> caching, but for the longer term we should really figure out a way to make >>>>>> this efficient. Could some of this processing be done on the client, >>>>>> perhaps by running a string replacement on HTML? >>>>>> >>>>>> On Fri, Jul 24, 2015 at 7:27 AM, Marko Obrovac < >>>>>> mobrovac@wikimedia.org> wrote: >>>>>> >>>>>>> Hi Bernd, >>>>>>> >>>>>>> On 24 July 2015 at 08:07, Bernd Sitzmann bernd@wikimedia.org >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Marko, >>>>>>>> >>>>>>>> There are a couple of parameters we pass to the mobileview >>>>>>>> action which depend either on device dimensions or on user preferences. >>>>>>>> * leadImageWidth: We calculate the desired lead image width >>>>>>>> to download on the client and pass that to the mobileview action API as >>>>>>>> "thumbsize".[1] >>>>>>>> * noimages: The user can chose to not download any images. >>>>>>>> When this is the case we add a "noimages": true flag to the PHP.[1] Then >>>>>>>> the payload returns no <img> tags. >>>>>>>> >>>>>>>> In the future there might be a few more. I could also see >>>>>>>> something similar to leadImageWidth, where we calculate the best size of >>>>>>>> images or videos to display. >>>>>>>> >>>>>>>> What do you recommend to accomplish the equivalent for >>>>>>>> RESTBase endpoints? >>>>>>>> >>>>>>> >>>>>>> What you are describing seems like complimentary information, >>>>>>> so I would recommend providing them as query parameters, with the >>>>>>> MobileApps service having some (sane) defaults in case these are missing. >>>>>>> The public API call would then be something like: https:// >>>>>>> (en|m). >>>>>>> wikipedia.org/api/rest_v1/page/mobile-html-full/Foobar?thumbsize=200&noimages=true >>>>>>> . >>>>>>> >>>>>>> Note that RESTBase needs the explicit list of query params and >>>>>>> headers that can be forwarded to back-end services, so if/when you do >>>>>>> implement this in the apps service, please notify us (phab, mail, irc, etc) >>>>>>> or try to include them in the RESTBase config concerning MobileApps~[1] >>>>>>> yourselves. >>>>>>> >>>>>>> Cheers, >>>>>>> Marko >>>>>>> >>>>>>> [1] >>>>>>> https://github.com/wikimedia/restbase/blob/master/specs/mediawiki/v1/mobilea... >>>>>>> >>>>>>> P.S. We are making really good progress on the deployment! >>>>>>> Hope to see it live soon :) >>>>>>> >>>>>>> >>>>>>>> Thanks, >>>>>>>> Bernd >>>>>>>> >>>>>>>> [1] >>>>>>>> https://en.m.wikipedia.org/w/api.php?action=mobileview&format=json&p... >>>>>>>> *noimages=true&thumbsize=640* >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Marko Obrovac, PhD >>>>>>> Senior Services Engineer >>>>>>> Wikimedia Foundation >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Gabriel Wicke >>>>>> Principal Engineer, Wikimedia Foundation >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Gabriel Wicke >>>> Principal Engineer, Wikimedia Foundation >>>> >>> >>> >> >> >> -- >> Gabriel Wicke >> Principal Engineer, Wikimedia Foundation >> > >
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
-- Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Tue, Jul 28, 2015 at 3:19 PM, Bernd Sitzmann bernd@wikimedia.org wrote:
*1) leadImageWidth: * *1A)* If the clients uses a constant value, let's say 800px for thumbsize action=mobileview parameter then the client could replace the /800px- portion in the resulting URL with the desired width, as long as the URL structure stays predictable[2]. If the string replacement fails we could still use the 800px URL.
MediaWiki thumbnail URLs are fragile and not meant to be used as an API. Mainly because of T74328 https://phabricator.wikimedia.org/T74328 but also because finding the size part is not always trivial (e.g. for a JPEG the URL could look like upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.jpg/qlow-1024px-Cernfounders.jpg ).
(e.g. for a JPEG the URL could look like upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.jpg/qlow-1024px-Cernfounders.jpg ).
Darn, really? Well, there goes my idea of hacking client-side URL generation... I guess the "thumb/6/6e" part isn't standard either. If we could just get the thumbnail server to return the original image when we go too large we'd be set. We could always request it anyway and fall back manually in case of 400 response (probably won't happen _too_ often).
On Tue, Jul 28, 2015 at 6:55 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Tue, Jul 28, 2015 at 3:19 PM, Bernd Sitzmann bernd@wikimedia.org wrote:
*1) leadImageWidth: * *1A)* If the clients uses a constant value, let's say 800px for thumbsize action=mobileview parameter then the client could replace the /800px- portion in the resulting URL with the desired width, as long as the URL structure stays predictable[2]. If the string replacement fails we could still use the 800px URL.
MediaWiki thumbnail URLs are fragile and not meant to be used as an API. Mainly because of T74328 https://phabricator.wikimedia.org/T74328 but also because finding the size part is not always trivial (e.g. for a JPEG the URL could look like upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Cernfounders.jpg/qlow-1024px-Cernfounders.jpg ).
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Tue, Jul 28, 2015 at 3:57 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
I guess the "thumb/6/6e" part isn't standard either.
It's always there as far as I know, but it's not standard in the sense that there is no promise of stability. Someone might at some point decide we should use a faster hash function or we have too many images and need three levels of depth or something else altogether... File names might go away soon as well as there are some experiments with using content hash based URLs so that cache invalidation becomes simpler.
So if you want to freeze elements of the current URL structure that should be a wider discussion involving ops and other folks working on media features. You can of course just use it at your own risk, for a JS component that would be no big deal as URL schemas don't change often; not sure how quickly app updates can be pushed out though.
If we could just get the thumbnail server to return the original image when we go too large we'd be set.
Fragmenting the cache would be problematic (if it comes to that, it's still better to fragment the HTML cache by appending an image size parameter since HTML content is smaller). I think something like T75935 https://phabricator.wikimedia.org/T75935 could work but AIUI the whole media cache infrastructure is held together with duct tape and no one is particularly enthusiastic to poke at it.
We could always request it anyway and fall back manually in case of 400
response (probably won't happen _too_ often).
MediaViewer does that (it also embeds the original size in the article HTML and uses that to guess the right URL); it mostly works but still comes with minor issues such as T70320 https://phabricator.wikimedia.org/T70320 or strange errors on the beta cluster which has minor differences in thumbnail behavior.
the whole media cache infrastructure is held together with duct tape and no one is particularly enthusiastic to poke at it.
That's what I found out when I asked around on IRC about this very thing. Could we get sufficient, inter-vertical commitment to do this properly if there were, say, a cross-platform epic about it next quarter? Just a thought...
On Tue, Jul 28, 2015 at 7:25 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Tue, Jul 28, 2015 at 3:57 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
I guess the "thumb/6/6e" part isn't standard either.
It's always there as far as I know, but it's not standard in the sense that there is no promise of stability. Someone might at some point decide we should use a faster hash function or we have too many images and need three levels of depth or something else altogether... File names might go away soon as well as there are some experiments with using content hash based URLs so that cache invalidation becomes simpler.
So if you want to freeze elements of the current URL structure that should be a wider discussion involving ops and other folks working on media features. You can of course just use it at your own risk, for a JS component that would be no big deal as URL schemas don't change often; not sure how quickly app updates can be pushed out though.
If we could just get the thumbnail server to return the original image when we go too large we'd be set.
Fragmenting the cache would be problematic (if it comes to that, it's still better to fragment the HTML cache by appending an image size parameter since HTML content is smaller). I think something like T75935 https://phabricator.wikimedia.org/T75935 could work but AIUI the whole media cache infrastructure is held together with duct tape and no one is particularly enthusiastic to poke at it.
We could always request it anyway and fall back manually in case of 400
response (probably won't happen _too_ often).
MediaViewer does that (it also embeds the original size in the article HTML and uses that to guess the right URL); it mostly works but still comes with minor issues such as T70320 https://phabricator.wikimedia.org/T70320 or strange errors on the beta cluster which has minor differences in thumbnail behavior.
Aaron did some work on hash-based thumb naming that was recently merged: https://phabricator.wikimedia.org/T1210
A switch to hash-based thumb urls could be an opportunity to define a predictable size / quality selection API, at least for large installations like ours.
Gabriel
On Tue, Jul 28, 2015 at 4:33 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
the whole media cache infrastructure is held together with duct tape and
no one is particularly enthusiastic to poke at it.
That's what I found out when I asked around on IRC about this very thing. Could we get sufficient, inter-vertical commitment to do this properly if there were, say, a cross-platform epic about it next quarter? Just a thought...
On Tue, Jul 28, 2015 at 7:25 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Tue, Jul 28, 2015 at 3:57 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
I guess the "thumb/6/6e" part isn't standard either.
It's always there as far as I know, but it's not standard in the sense that there is no promise of stability. Someone might at some point decide we should use a faster hash function or we have too many images and need three levels of depth or something else altogether... File names might go away soon as well as there are some experiments with using content hash based URLs so that cache invalidation becomes simpler.
So if you want to freeze elements of the current URL structure that should be a wider discussion involving ops and other folks working on media features. You can of course just use it at your own risk, for a JS component that would be no big deal as URL schemas don't change often; not sure how quickly app updates can be pushed out though.
If we could just get the thumbnail server to return the original image when we go too large we'd be set.
Fragmenting the cache would be problematic (if it comes to that, it's still better to fragment the HTML cache by appending an image size parameter since HTML content is smaller). I think something like T75935 https://phabricator.wikimedia.org/T75935 could work but AIUI the whole media cache infrastructure is held together with duct tape and no one is particularly enthusiastic to poke at it.
We could always request it anyway and fall back manually in case of 400
response (probably won't happen _too_ often).
MediaViewer does that (it also embeds the original size in the article HTML and uses that to guess the right URL); it mostly works but still comes with minor issues such as T70320 https://phabricator.wikimedia.org/T70320 or strange errors on the beta cluster which has minor differences in thumbnail behavior.
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
Here's an update on what happened so far with this:
*1) leadImageWidth: * The service provides an object with three common sizes for lead image URLs, so the clients don't have to change URLs. If the image URL scheme changes we can make the appropriate updates on the services side without having to change the clients.
Example from https://en.wikipedia.org/api/rest_v1/page/mobile-html-sections-lead/Cat%5B1]:
"image": {"file": "Cat poster 1.jpg","urls": {"640": "// upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Cat_poster_1.jpg/640px-Cat_poster_1.jpg ","800": "// upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Cat_poster_1.jpg/800px-Cat_poster_1.jpg ","1024": "// upload.wikimedia.org/wikipedia/commons/thumb/0/0b/Cat_poster_1.jpg/1024px-Cat_poster_1.jpg "}},
*2) noimages: * For this one I plan to use a query parameter since this is rarely used.
[1] Yes, the service is deployed to a production environment. :) While slightly off-topic. I did a service implementation walk-through with the other Android app developers: https://www.youtube.com/watch?v=Ebd2exZ4T9o
Cheers, --Bernd
On Tue, Jul 28, 2015 at 5:52 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
Aaron did some work on hash-based thumb naming that was recently merged: https://phabricator.wikimedia.org/T1210
A switch to hash-based thumb urls could be an opportunity to define a predictable size / quality selection API, at least for large installations like ours.
Gabriel
On Tue, Jul 28, 2015 at 4:33 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
the whole media cache infrastructure is held together with duct tape and
no one is particularly enthusiastic to poke at it.
That's what I found out when I asked around on IRC about this very thing. Could we get sufficient, inter-vertical commitment to do this properly if there were, say, a cross-platform epic about it next quarter? Just a thought...
On Tue, Jul 28, 2015 at 7:25 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Tue, Jul 28, 2015 at 3:57 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
I guess the "thumb/6/6e" part isn't standard either.
It's always there as far as I know, but it's not standard in the sense that there is no promise of stability. Someone might at some point decide we should use a faster hash function or we have too many images and need three levels of depth or something else altogether... File names might go away soon as well as there are some experiments with using content hash based URLs so that cache invalidation becomes simpler.
So if you want to freeze elements of the current URL structure that should be a wider discussion involving ops and other folks working on media features. You can of course just use it at your own risk, for a JS component that would be no big deal as URL schemas don't change often; not sure how quickly app updates can be pushed out though.
If we could just get the thumbnail server to return the original image when we go too large we'd be set.
Fragmenting the cache would be problematic (if it comes to that, it's still better to fragment the HTML cache by appending an image size parameter since HTML content is smaller). I think something like T75935 https://phabricator.wikimedia.org/T75935 could work but AIUI the whole media cache infrastructure is held together with duct tape and no one is particularly enthusiastic to poke at it.
We could always request it anyway and fall back manually in case of 400
response (probably won't happen _too_ often).
MediaViewer does that (it also embeds the original size in the article HTML and uses that to guess the right URL); it mostly works but still comes with minor issues such as T70320 https://phabricator.wikimedia.org/T70320 or strange errors on the beta cluster which has minor differences in thumbnail behavior.
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation