All true. The images should not be rethumb'd unless resolution changes, a new version is uploaded, or the cache is otherwise purged. However, on initial rendering, the thumb generation can be a large part (especially if rendering multiple images) of overall page execution time. Being able to offload this elsewhere should decrease that load greatly.
-Chad
On Apr 24, 2009 1:23 PM, "Roan Kattouw" roan.kattouw@gmail.com wrote:
2009/4/24 Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com
:
How long does it take to thumbnail a typical image, though? Even a >
parser cache hit (but Squid ... The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows.
Moreover, in MediaWiki's case specifically, *very* few requests should >
actually require the thu... That's true, we're already doing that.
So it's not a good case to optimize > for.
AFAICT this isn't about optimization, it's about not bogging down the Apache that has the misfortune of getting the first request to thumb a huge image (but having a dedicated server for that instead), and about not letting the associated user wait for ages. Even worse, requests that thumb very large images could hit the 30s execution limit and fail, which means those thumbs will never be generated but every user requesting it will have a request last for 30s and time out.
Roan Kattouw (Catrope)
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia....
2009/4/24 Chad innocentkiller@gmail.com:
All true. The images should not be rethumb'd unless resolution changes, a new version is uploaded, or the cache is otherwise purged.
Repeat: this is what we do already (not sure if that's what you're trying to say, but "should" implies differently).
Roan Kattouw (Catrope)
I'm agreeing with you. By "should" I meant "this should be happening already and issues with this are bugs."
-Chad
On Apr 24, 2009 1:32 PM, "Roan Kattouw" roan.kattouw@gmail.com wrote:
2009/4/24 Chad innocentkiller@gmail.com:
All true. The images should not be rethumb'd unless > resolution changes,
a new version is uploade... Repeat: this is what we do already (not sure if that's what you're trying to say, but "should" implies differently).
Roan Kattouw (Catrope) _______________________________________________ Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 4/24/09 10:32 AM, Roan Kattouw wrote:
2009/4/24 Chadinnocentkiller@gmail.com:
All true. The images should not be rethumb'd unless resolution changes, a new version is uploaded, or the cache is otherwise purged.
Repeat: this is what we do already (not sure if that's what you're trying to say, but "should" implies differently).
Just to summarize the current state, here's the default MediaWiki configuration workflow:
* During page rendering, MediaWiki checks if a thumb of the proper size exists. * if not, we resize it synchronously on the same server (via GD or shell out to ImageMagick etc) * an <img> pointing to the file is added to output * The web browser loads up the already-rendered image file in the page.
Here's the behavior variant we have on Wikimedia sites:
* During page rendering, we make an <img> pointing to where the thumbnail should be * The web browser requests the thumbnail image file * If it doesn't exist, the upload web server proxies the request [1] back to MediaWiki, running on a subcluster which handles only thumbnail generation * MediaWiki resizes it synchronously via shell out to ImageMagick * The web server serves the now-completed file back to the client, and it's now on disk for the next request
[1] http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/upload-scripts/
This prevents slow or broken thumbnailing operations from bogging down the *main* web servers, but if it's not reasonably fast we still have difficulties:
* No placeholder image -- browser just shows a nice blank box * Multiple requests will cause multiple attempts to resize at once, potentially eating up all CPU time/memory/tmp disk space on the thumbnailing cluster
So if we've got, say, a 50 megapixel PNG or TIFF high-res scan, or a giant animated GIF which is very expensive to resize, we don't have a good way of producing a thumbnail on a good schedule. It'll either time out a lot every time it changes, or just never actually complete.
If we have a way to defer things we know will take longer, and show a placeholder until it's completed, then we can use those things more reliably.
One suggestion that's been brought up for large images is to create a smaller version *once at upload time* which can then be used to quickly create inline thumbnails of various sizes on demand. But we still need some way to manage that asynchronous initial rendering, and have some kind of friendly behavior for what to show while it's working.
-- brion
Brion Vibber brion@wikimedia.org writes:
Just to summarize the current state, here's the default MediaWiki configuration workflow:
- During page rendering, MediaWiki checks if a thumb of the proper size
exists.
- if not, we resize it synchronously on the same server (via GD or
shell out to ImageMagick etc)
- an <img> pointing to the file is added to output
- The web browser loads up the already-rendered image file in the page.
Here's the behavior variant we have on Wikimedia sites:
- During page rendering, we make an <img> pointing to where the
thumbnail should be
- The web browser requests the thumbnail image file
- If it doesn't exist, the upload web server proxies the request [1]
back to MediaWiki, running on a subcluster which handles only thumbnail generation * MediaWiki resizes it synchronously via shell out to ImageMagick
- The web server serves the now-completed file back to the client,
and it's now on disk for the next request
The simpler approach suggested by Nikola seems to be able to address all the needs here without changing the way Mediawiki currently works.
The daemon will reply only once to each request after it copied the placeholder to specified destination with the same file name which the daemon later overwrites with the real thumbnail when generating is done silently in the background, no notification reply any more. This way we can get rid of all the complexity asynchronous reply would cause.
There will be two places AFAIU where Mediawiki should request this daemon:
1. when the requested thumbnail doesn't exists
2. when user uploads a large image, (to generate an intermediate source image for future resizing), in this case the request object can contain a flag to instruct the daemon to skip the placeholder coping step.
wikitech-l@lists.wikimedia.org