Re: [Multimedia] [Ops] Caching API responses

List overview All Threads
Download

newer

older

MediaViewer URL format

Pilots, bugfixes, and more - oh...

Gilles Dubuc

17 Apr 2014 17 Apr '14

10:53 p.m.

Including the multimedia list, since the discussion is now broader. Gergo, Mark, I encourage you to read the backlog: https://lists.wikimedia.org/mailman/private/ops/2014-April/thread.html#31981

On Thu, Apr 17, 2014 at 4:31 PM, Brad Jorsch (Anomie) <bjorsch@wikimedia.org

...

wrote:

...

On Thu, Apr 17, 2014 at 4:13 AM, Gilles Dubuc gilles@wikimedia.orgwrote:

...
When the user opens media viewer, but there are 4 API calls per image

When I tried it just now, I saw 6 queries: one to prop=imageinfo to fetch a number of different props, one to meta=filerepoinfo, one to list=imageusage, one to prop=globalusage, and two more to prop=imageinfo to fetch the URLs for two different sizes of the image.

The first four could all be combined into one query (this is an advantage of the batch design of the web API over the much-touted REST model):

https://www.mediawiki.org/w/api.php?action=query&format=json&prop=im...

Being able to merge in the last two as well would be bug 54035.

Also, getting really offtopic here, "guprop[]=url&guprop[]=namespace" and "&iunamespace[]=0&iunamespace[]=100" that I see in your original queries doesn't actually work; it gives the same results as if guprop and iunamespace are omitted entirely. The API should give a warning about that (filed as bug 64057).

-- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation

Ops mailing list Ops@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ops

Attachments:

attachment.htm (text/html — 3.7 KB)

Show replies by date

Gergo Tisza

18 Apr 18 Apr

1:21 a.m.

New subject: [Ops] Caching API responses

On Thu, Apr 17, 2014 at 8:53 AM, Gilles Dubuc gilles@wikimedia.org wrote:

...

Including the multimedia list, since the discussion is now broader. Gergo, Mark, I encourage you to read the backlog: https://lists.wikimedia.org/mailman/private/ops/2014-April/thread.html#31981

ops archives are private.

On Thu, Apr 17, 2014 at 4:31 PM, Brad Jorsch (Anomie) <

...

bjorsch@wikimedia.org> wrote:

...
When I tried it just now, I saw 6 queries: one to prop=imageinfo to fetch a number of different props, one to meta=filerepoinfo, one to list=imageusage, one to prop=globalusage, and two more to prop=imageinfo to fetch the URLs for two different sizes of the image.

There is a userinfo API call as well (you probably missed it because it is via JSONP in some cases). We just got rid of the extra imageinfo calls (in most cases), so 4 queries per image + a filerepoinfo query once per page, now.

Four of those could be combined, but that would complicate the code a lot even in its current state (and much more if we do some sort of caching, and need to deal with invalidation, which is different for every API query). I am not sure there is much benefit to it; when cached, those queries should be fast anyway, and when not cached, the single query might actually be slower since everything happens sequentially in PHP, while the independent JS requests would be parallel to some extent. (We should probably measure this.)

Also, getting really offtopic here, "guprop[]=url&guprop[]=namespace" and

...

...
"&iunamespace[]=0&iunamespace[]=100" that I see in your original queries doesn't actually work; it gives the same results as if guprop and iunamespace are omitted entirely. The API should give a warning about that (filed as bug 64057).

Probably also a bug in the mediawiki.api JS library which produces such an URL if the argument is an array. Or is there a legitimate use case for that?

Max Semenik

1:38 a.m.

New subject: [Ops] Caching API responses

On Thu, Apr 17, 2014 at 11:21 AM, Gergo Tisza gtisza@wikimedia.org wrote:

...

Four of those could be combined, but that would complicate the code a lot even in its current state (and much more if we do some sort of caching, and need to deal with invalidation, which is different for every API query). I am not sure there is much benefit to it; when cached, those queries should be fast anyway, and when not cached, the single query might actually be slower since everything happens sequentially in PHP, while the independent JS requests would be parallel to some extent. (We should probably measure this.)

Wrong. Every request has an overhead in MediaWiki, Apache and Varnish. See the nice spike in [1] for example when mobile was making 2 requests instead of 1. You're proposing to make 4.

----- [1] http://ganglia.wikimedia.org/latest/graph.php?r=year&z=xlarge&c=API+...

Gabriel Wicke

1:52 a.m.

New subject: [Ops] Caching API responses

On 04/17/2014 11:38 AM, Max Semenik wrote:

...

On Thu, Apr 17, 2014 at 11:21 AM, Gergo Tisza <gtisza@wikimedia.org mailto:gtisza@wikimedia.org> wrote:
Four of those could be combined, but that would complicate the code a
lot even in its current state (and much more if we do some sort of
caching, and need to deal with invalidation, which is different for
every API query).
I am not sure there is much benefit to it; when cached, those queries
should be fast anyway, and when not cached, the single query might
actually be slower since everything happens sequentially in PHP, while
the independent JS requests would be parallel to some extent. (We should
probably measure this.)
Wrong. Every request has an overhead in MediaWiki, Apache and Varnish. See the nice spike in [1] for example when mobile was making 2 requests instead of 1. You're proposing to make 4.

The current PHP per-request overheads are indeed less than ideal and justify some application-level batching for small requests. With HHVM, SPDY, node.js etc things are moving towards lower per-request overheads though. A cached response over SPDY will typically be faster than anything you can do in PHP, and will at the same time use less server-side resources.

Also, we need to carefully distinguish client-side latency (perceived 'performance') from efficiency. Performing several requests in parallel will typically result in a lower latency for a client, but might cause higher loads on the servers if those requests are not cached and per-request overheads are high.

Gabriel

Gergo Tisza

2:16 a.m.

New subject: [Ops] Caching API responses

On Thu, Apr 17, 2014 at 11:38 AM, Max Semenik msemenik@wikimedia.orgwrote:

...

On Thu, Apr 17, 2014 at 11:21 AM, Gergo Tisza gtisza@wikimedia.orgwrote:

...
Four of those could be combined, but that would complicate the code a lot even in its current state (and much more if we do some sort of caching, and need to deal with invalidation, which is different for every API query). I am not sure there is much benefit to it; when cached, those queries should be fast anyway, and when not cached, the single query might actually be slower since everything happens sequentially in PHP, while the independent JS requests would be parallel to some extent. (We should probably measure this.)

Wrong. Every request has an overhead in MediaWiki, Apache and Varnish.

On the server side, sure. On the client side, the overhead is tiny and the requests will be spread out over multiple machines and processed in parallel, so the actual performance might be better. I would expect the servers to be able to deal with a couple API requests per user action (I think this approach of firing small separate queries is pretty much standard these days - as a quick comparison, Flickr fires 12 AJAX requests per image). Is that not the case?

Also, batching everything together makes efficient caching very hard. filerepoinfo is per-wiki and could be cached pretty much forever, imageusage/globalusage can be cached for days because we do not care terribly whether it is up to date, imageinfo cannot be cached for long because description, license etc. is based on it. If you batch everything together, you get the lowest common denominator in caching. I would expect the overhead of splitting a query into multiple ones to be smaller than the overhead of Apache/MediaWiki handling of a query (or query part) which could be handled entirely by Varnish.

3878

Age (days ago)

3878

Last active (days ago)

multimedia@lists.wikimedia.org

4 comments

4 participants

tags (0)

participants (4)

Gabriel Wicke
Gergo Tisza
Gilles Dubuc
Max Semenik