Yesterday, the selection of GSoC projects was officially announced. For MediaWiki, the following projects have been accepted:
* Niklas Laxström (Nikerabbit), mentored by Siebrand, will be working on improving localization and internationalization in MediaWiki, as well as improving the Translate extension used on translatewiki.net * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more * Jeroen de Dauw, mentored by Yaron Koren, will be improving the Semantic Layers extension and merging it into the Semantic Google Maps extension * Gerardo Antonio Cabero, mentored by Michael Dale (mdale), will be improving the Cortado applet for video playback (I'm a bit fuzzy on the details for this one)
The official list with links to (parts of) the proposals can be found at the Google website [1]; lists for other organizations can be reached through the list of participating organizations [2].
The next event on the GSoC timeline [3] is the community bonding period [4], during which the students are supposed to get to know their mentors and the community. This period lasts until May 23rd, when the students actually begin coding.
Starting now and continuing at least until the end of GSoC in August, you will probably see and hear from the students on IRC and the mailing lists and hear about the projects they're working on. To repeat the crux of an earlier thread on this list [5]: be nice to these special newcomers, make them feel welcome and comfortable, and try not to bite them :)
To the mentors and students: have fun!
Roan Kattouw (Catrope)
[1] http://socghop.appspot.com/org/home/google/gsoc2009/wikimedia [2] http://socghop.appspot.com/program/accepted_orgs/google/gsoc2009 [3] http://socghop.appspot.com/document/show/program/google/gsoc2009/timeline [4] http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bon... [5] http://lists.wikimedia.org/pipermail/wikitech-l/2009-March/041964.html
On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw roan.kattouw@gmail.comwrote:
- Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more
Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop.
Marco
I was looking at http://editor.pixastic.com/ ... "wiki-style photoshop" would be cool ... but not in the scope of that soc project ;)
--michael
Marco Schuster wrote:
On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw roan.kattouw@gmail.comwrote:
- Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more
Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop.
Marco
2009/4/22 Michael Dale mdale@wikimedia.org:
Marco Schuster wrote:
On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw roan.kattouw@gmail.comwrote:
- Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more
Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop.
I was looking at http://editor.pixastic.com/ ... "wiki-style photoshop" would be cool ... but not in the scope of that soc project ;)
You can do pretty much anything with ImageMagick. Trouble is that it's not the fastest at *anything*. Depends how much that affects performance in practice - something that *just* thumbnails could be all sorts of more efficient, but you'd need a new program for each function, and most Unix users of MediaWiki thumbnail with ImageMagick already so it'll be there.
- d.
On Tue, Apr 21, 2009 at 8:16 PM, David Gerard dgerard@gmail.com wrote:
2009/4/22 Michael Dale mdale@wikimedia.org:
Marco Schuster wrote:
On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw roan.kattouw@gmail.comwrote:
- Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more
Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop.
I was looking at http://editor.pixastic.com/ ... "wiki-style photoshop" would be cool ... but not in the scope of that soc project ;)
You can do pretty much anything with ImageMagick. Trouble is that it's not the fastest at *anything*. Depends how much that affects performance in practice - something that *just* thumbnails could be all sorts of more efficient, but you'd need a new program for each function, and most Unix users of MediaWiki thumbnail with ImageMagick already so it'll be there.
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
The main issue with the daemon idea (which was discussed at length in #mediawiki a few weeks ago) is that it requires a major change in how we handle images.
Right now, the process involves rendering on-demand, rather than at-leisure. This has the benefit of always producing an ideal thumb'd image at the end of every parse. However the major drawbacks are an increase in parsing time (while we wait for ImageMagik to do its thing) and an increased load on the app servers. The only time we can sidestep this is if someone uses a thumb dimension for which we already have a thumb rendered.
In order for this to work, we'd need to shift to a style of "render when you get a chance, but give me the best fit for now." Basically, we'd begin parsing and find that we need a thumbnailed copy of some image, but we don't have the ideal size just yet. Instead, we could return the best-fitting thumbnail so far and use that until the daemon has given us the right image.
Not an easy task, but I certainly hope some progress can be made on it over the summer :)
-Chad
On Tue, Apr 21, 2009 at 8:34 PM, Chad innocentkiller@gmail.com wrote:
The main issue with the daemon idea (which was discussed at length in #mediawiki a few weeks ago) is that it requires a major change in how we handle images.
Right now, the process involves rendering on-demand, rather than at-leisure. This has the benefit of always producing an ideal thumb'd image at the end of every parse. However the major drawbacks are an increase in parsing time (while we wait for ImageMagik to do its thing) and an increased load on the app servers. The only time we can sidestep this is if someone uses a thumb dimension for which we already have a thumb rendered.
In order for this to work, we'd need to shift to a style of "render when you get a chance, but give me the best fit for now." Basically, we'd begin parsing and find that we need a thumbnailed copy of some image, but we don't have the ideal size just yet. Instead, we could return the best-fitting thumbnail so far and use that until the daemon has given us the right image.
I'm not clear on why we don't just make the daemon synchronously return a result the way ImageMagick effectively does. Given the level of reuse of thumbnails, it seems unlikely that the latency is a significant concern -- virtually no requests will ever actually wait on it.
Aryeh Gregor wrote:
I'm not clear on why we don't just make the daemon synchronously return a result the way ImageMagick effectively does. Given the level of reuse of thumbnails, it seems unlikely that the latency is a significant concern -- virtually no requests will ever actually wait on it.
( I basically outlined these issues on the soc page but here they are again with at bit more clarity )
I recommended that the image daemon run semi-synchronously since the changes needed to maintain multiple states and return non-cached place-holder images while managing updates and page purges for when the updated images are available within the wikimedia server architecture probably won't be completed in the summer of code time-line. But if the student is up for it the concept would be useful for other components like video transformation / transcoding, sequence flattening etc. But its not what I would recommend for the summer of code time-line.
== per issues outlined in bug 4854 == I don't think its a good idea to invest a lot of energy into a separate python based image daemon. It won't avoid all problems listed in bug 4854
Shell-character-exploit issues should be checked against anyway (since not everyone is going to install the daemon)
Other people using mediaWiki won't add a python or java based image resize and resolve dependency python or java component & libraries. It won't be easier to install than imagemagick or "php-gd" that are repository hosted applications and already present in shared hosting environments.
Once you start integrating other libs like (java) Batik it becomes difficult to resolve dependencies (java, python etc) and to install you have to push out a "new program" that is not integrated into all the application repository manages for the various distributions.
Potential to isolate CPU and memory usage should be considered in the core medaiWiki image resize support anyway . ie we don't want to crash other peoples servers who are using mediaWiki by not checking upper bounds of image transforms. Instead we should make the core image transform smarter maybe have a configuration var that /attempts/ to bind the upper memory for spawned processing and take that into account before issuing the shell command for a given large image transformation with a given sell application.
== what would probably be better for the image resize efforts should focus on ===
(1) making the existing system "more robust" and (2) better taking advantage of multi-threaded servers.
(1) right now the system chokes on large images we should deploy support for an in-place image resize maybe something like vips (?) (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) The system should intelligently call vips to transform the image to a reasonable size at time of upload then use those derivative for just in time thumbs for articles. ( If vips is unavailable we don't transform and we don't crash the apache node.)
(2) maybe spinning out the image transform process early on in the parsing of the page with a place-holder and callback so by the time all the templates and links have been looked up the image is ready for output. (maybe another function wfShellBackgroundExec($cmd, $callback_function) (maybe using |pcntl_fork then normal |wfShellExec then| ||pcntl_waitpid then callback function ... which sets some var in the parent process so that pageOutput knows its good to go) |
If operationally the "daemon" should be on a separate server we should still more or less run synchronously ... as mentioned above ... if possible the daemon should be php based so we don't explode the dependencies for deploying robust image handling with mediaWiki.
peace, --michael
Michael Dale mdale@wikimedia.org writes:
I recommended that the image daemon run semi-synchronously since the changes needed to maintain multiple states and return non-cached place-holder images while managing updates and page purges for when the updated images are available within the wikimedia server architecture probably won't be completed in the summer of code time-line. But if the student is up for it the concept would be useful for other components like video transformation / transcoding, sequence flattening etc. But its not what I would recommend for the summer of code time-line.
I may have problems understanding the concept "semi-synchronously", does it mean when MW parse a page that contains thumbnail images, the parser sends requests to daemon which would reply twice for each request, one immediately with a best fit or a place holder (synchronously), one later on when thumbnail is ready (asynchronously)?
== what would probably be better for the image resize efforts should focus on ===
(1) making the existing system "more robust" and (2) better taking advantage of multi-threaded servers.
(1) right now the system chokes on large images we should deploy support for an in-place image resize maybe something like vips (?) (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) The system should intelligently call vips to transform the image to a reasonable size at time of upload then use those derivative for just in time thumbs for articles. ( If vips is unavailable we don't transform and we don't crash the apache node.)
Wow, vips sounds great, still reading its documentation. How is its performance on relatively small size (not huge, a few hundreds pixels in width/height) images compared with traditional single threaded resizing programs?
(2) maybe spinning out the image transform process early on in the parsing of the page with a place-holder and callback so by the time all the templates and links have been looked up the image is ready for output. (maybe another function wfShellBackgroundExec($cmd, $callback_function) (maybe using |pcntl_fork then normal |wfShellExec then| ||pcntl_waitpid then callback function ... which sets some var in the parent process so that pageOutput knows its good to go) |
Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this:
1. mw parser send request to daemon 2. daemon finds the work non-trivial, reply *immediately* with a best fit or just a place holder 3. browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX 4. daemon reply to the browser when thumbnail is ready 5. browser replace temporary best fit / place holder with new thumb using Javascript
The daemon now have to deal with two kinds of clients: mw servers and browsers.
Letting browser wait instead of mw server has the benefit of reduced latency for users while still have an acceptable page to read before image replacing takes place and a perfect page after that. For most of users, it's likely that the replacing occurs as soon as page loading ends, since transfering page takes some time, and daemon would have already finished thumbnailing in the process.
Wu Zhe wrote:
Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this:
- mw parser send request to daemon
- daemon finds the work non-trivial, reply *immediately* with a best fit or just a place holder
- browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX
- daemon reply to the browser when thumbnail is ready
- browser replace temporary best fit / place holder with new thumb using Javascript
The daemon now have to deal with two kinds of clients: mw servers and browsers.
To me this looks way too overcomplicated. I suggest a simpler approach:
1. mw copies a placeholder image to the appropriate filename: the placeholder could be the original image, best match thumb or a PNG with text "wait until the thumbnail renders"; 2. mw send request to daemon; 3. daemon copies resized image over the placeholder.
Nikola Smolenski smolensk@eunet.yu writes:
Wu Zhe wrote:
Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this:
- mw parser send request to daemon
- daemon finds the work non-trivial, reply *immediately* with a best fit or just a place holder
- browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX
- daemon reply to the browser when thumbnail is ready
- browser replace temporary best fit / place holder with new thumb using Javascript
The daemon now have to deal with two kinds of clients: mw servers and browsers.
To me this looks way too overcomplicated. I suggest a simpler approach:
- mw copies a placeholder image to the appropriate filename: the
placeholder could be the original image, best match thumb or a PNG with text "wait until the thumbnail renders"; 2. mw send request to daemon; 3. daemon copies resized image over the placeholder.
This simpler approach differs in that it gets rid of the AJAX thing, now users have to manually refresh the page. Whether the AJAX is worth the effort is discussable.
Michael Dale mdale@wikimedia.org writes:
I recommended that the image daemon run semi-synchronously since the changes needed to maintain multiple states and return non-cached place-holder images while managing updates and page purges for when the updated images are available within the wikimedia server architecture probably won't be completed in the summer of code time-line. But if the student is up for it the concept would be useful for other components like video transformation / transcoding, sequence flattening etc. But its not what I would recommend for the summer of code time-line.
I may have problems understanding the concept "semi-synchronously", does it mean when MW parse a page that contains thumbnail images, the parser sends requests to daemon which would reply twice for each request, one immediately with a best fit or a place holder (synchronously), one later on when thumbnail is ready (asynchronously)?
== what would probably be better for the image resize efforts should focus on ===
(1) making the existing system "more robust" and (2) better taking advantage of multi-threaded servers.
(1) right now the system chokes on large images we should deploy support for an in-place image resize maybe something like vips (?) (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) The system should intelligently call vips to transform the image to a reasonable size at time of upload then use those derivative for just in time thumbs for articles. ( If vips is unavailable we don't transform and we don't crash the apache node.)
Wow, vips sounds great, still reading its documentation. How is its performance on relatively small size (not huge, a few hundreds pixels in width/height) images compared with traditional single threaded resizing programs?
(2) maybe spinning out the image transform process early on in the parsing of the page with a place-holder and callback so by the time all the templates and links have been looked up the image is ready for output. (maybe another function wfShellBackgroundExec($cmd, $callback_function) (maybe using |pcntl_fork then normal |wfShellExec then| ||pcntl_waitpid then callback function ... which sets some var in the parent process so that pageOutput knows its good to go) |
Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this:
1. mw parser send request to daemon 2. daemon finds the work non-trivial, reply *immediately* with a best fit or just a place holder 3. browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX 4. daemon reply to the browser when thumbnail is ready 5. browser replace temporary best fit / place holder with new thumb using Javascript
The daemon now have to deal with two kinds of clients: mw servers and browsers.
Letting browser wait instead of mw server has the benefit of reduced latency for users while still have an acceptable page to read before image replacing takes place and a perfect page after that. For most of users, it's likely that the replacing occurs as soon as page loading ends, since transfering page takes some time, and daemon would have already finished thumbnailing in the process.
Sorry about the duplicates, I posted via gmane, but haven't seen my post there for some time and thought there must be something wrong with gmane. This won't happen again.
Michael Dale mdale@wikimedia.org writes:
I recommended that the image daemon run semi-synchronously since the changes needed to maintain multiple states and return non-cached place-holder images while managing updates and page purges for when the updated images are available within the wikimedia server architecture probably won't be completed in the summer of code time-line. But if the student is up for it the concept would be useful for other components like video transformation / transcoding, sequence flattening etc. But its not what I would recommend for the summer of code time-line.
I may have problems understanding the concept "semi-synchronously", does it mean when MW parse a page that contains thumbnail images, the parser sends requests to daemon which would reply twice for each request, one immediately with a best fit or a place holder (synchronously), one later on when thumbnail is ready (asynchronously)?
== what would probably be better for the image resize efforts should focus on ===
(1) making the existing system "more robust" and (2) better taking advantage of multi-threaded servers.
(1) right now the system chokes on large images we should deploy support for an in-place image resize maybe something like vips (?) (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) The system should intelligently call vips to transform the image to a reasonable size at time of upload then use those derivative for just in time thumbs for articles. ( If vips is unavailable we don't transform and we don't crash the apache node.)
Wow, vips sounds great, still reading its documentation. How is its performance on relatively small size (not huge, a few hundreds pixels in width/height) images compared with traditional single threaded resizing programs?
(2) maybe spinning out the image transform process early on in the parsing of the page with a place-holder and callback so by the time all the templates and links have been looked up the image is ready for output. (maybe another function wfShellBackgroundExec($cmd, $callback_function) (maybe using |pcntl_fork then normal |wfShellExec then| ||pcntl_waitpid then callback function ... which sets some var in the parent process so that pageOutput knows its good to go) |
Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this:
1. mw parser send request to daemon 2. daemon finds the work non-trivial, reply *immediately* with a best fit or just a place holder 3. browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX 4. daemon reply to the browser when thumbnail is ready 5. browser replace temporary best fit / place holder with new thumb using Javascript
Daemon now have to deal with two kinds of clients: mw servers and browsers.
Letting browser wait instead of mw server has the benefit of reduced latency for users while still have an acceptable page to read before image replacing takes place and a perfect page after that. For most of users, it's likely that the replacing occurs as soon as page loading ends, since transfering page takes some time, and daemon would have already finished thumbnailing in the process.
Michael Dale mdale@wikimedia.org writes:
I recommended that the image daemon run semi-synchronously since the changes needed to maintain multiple states and return non-cached place-holder images while managing updates and page purges for when the updated images are available within the wikimedia server architecture probably won't be completed in the summer of code time-line. But if the student is up for it the concept would be useful for other components like video transformation / transcoding, sequence flattening etc. But its not what I would recommend for the summer of code time-line.
I may have problems understanding the concept "semi-synchronously", does it mean when MW parse a page that contains thumbnail images, the parser sends requests to daemon which would reply twice for each request, one immediately with a best fit or a place holder (synchronously), one later on when thumbnail is ready (asynchronously)?
== what would probably be better for the image resize efforts should focus on ===
(1) making the existing system "more robust" and (2) better taking advantage of multi-threaded servers.
(1) right now the system chokes on large images we should deploy support for an in-place image resize maybe something like vips (?) (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use) The system should intelligently call vips to transform the image to a reasonable size at time of upload then use those derivative for just in time thumbs for articles. ( If vips is unavailable we don't transform and we don't crash the apache node.)
Wow, vips sounds great, still reading its documentation. How is its performance on relatively small size (not huge, a few hundreds pixels in width/height) images compared with traditional single threaded resizing programs?
(2) maybe spinning out the image transform process early on in the parsing of the page with a place-holder and callback so by the time all the templates and links have been looked up the image is ready for output. (maybe another function wfShellBackgroundExec($cmd, $callback_function) (maybe using |pcntl_fork then normal |wfShellExec then| ||pcntl_waitpid then callback function ... which sets some var in the parent process so that pageOutput knows its good to go) |
Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this:
1. mw parser send request to daemon 2. daemon finds the work non-trivial, reply *immediately* with a best fit or just a place holder 3. browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX 4. daemon reply to the browser when thumbnail is ready 5. browser replace temporary best fit / place holder with new thumb using Javascript
Daemon now have to deal with two kinds of clients: mw servers and browsers.
Letting browser wait instead of mw server has the benefit of reduced latency for users while still have an acceptable page to read before image replacing takes place and a perfect page after that. For most of users, it's likely that the replacing occurs as soon as page loading ends, since transfering page takes some time, and daemon would have already finished thumbnailing in the process.
On Fri, Apr 24, 2009 at 12:31 AM, Wu Zhe wu@madk.org wrote:
Asynchronous daemon doesn't make much sense if page purge occurs on server side, but what if we put off page purge to the browser? It works like this:
- mw parser send request to daemon
- daemon finds the work non-trivial, reply *immediately* with a best
fit or just a place holder 3. browser renders the page, finds it's not final, so sends a request to daemon directly using AJAX 4. daemon reply to the browser when thumbnail is ready 5. browser replace temporary best fit / place holder with new thumb using Javascript
Daemon now have to deal with two kinds of clients: mw servers and browsers.
Letting browser wait instead of mw server has the benefit of reduced latency for users while still have an acceptable page to read before image replacing takes place and a perfect page after that. For most of users, it's likely that the replacing occurs as soon as page loading ends, since transfering page takes some time, and daemon would have already finished thumbnailing in the process.
How long does it take to thumbnail a typical image, though? Even a parser cache hit (but Squid miss) will take hundreds of milliseconds to serve and hundreds of more milliseconds for network latency. If we're talking about each image adding 10 ms to the latency, then it's not worth it to add all this fancy asynchronous stuff.
Moreover, in MediaWiki's case specifically, *very* few requests should actually require the thumbnailing. Only the first request for a given size of a given image should ever require thumbnailing: that can then be cached more or less forever. So it's not a good case to optimize for. If the architecture should be simplified significantly at the cost of slight extra latency in 0.01% of requests, I think it's clear that the simpler architecture is superior.
2009/4/24 Aryeh Gregor Simetrical+wikilist@gmail.com:
How long does it take to thumbnail a typical image, though? Even a parser cache hit (but Squid miss) will take hundreds of milliseconds to serve and hundreds of more milliseconds for network latency. If we're talking about each image adding 10 ms to the latency, then it's not worth it to add all this fancy asynchronous stuff.
The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows.
Moreover, in MediaWiki's case specifically, *very* few requests should actually require the thumbnailing. Only the first request for a given size of a given image should ever require thumbnailing: that can then be cached more or less forever.
That's true, we're already doing that.
So it's not a good case to optimize for.
AFAICT this isn't about optimization, it's about not bogging down the Apache that has the misfortune of getting the first request to thumb a huge image (but having a dedicated server for that instead), and about not letting the associated user wait for ages. Even worse, requests that thumb very large images could hit the 30s execution limit and fail, which means those thumbs will never be generated but every user requesting it will have a request last for 30s and time out.
Roan Kattouw (Catrope)
Hoi, At the moment we have an upper limit of 100Mb. The people who do restorations have one file that is 680Mb.. The corresponding jpg is also quite big !! Thanks, GerardM
2009/4/24 Roan Kattouw roan.kattouw@gmail.com
2009/4/24 Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com
: How long does it take to thumbnail a typical image, though? Even a parser cache hit (but Squid miss) will take hundreds of milliseconds to serve and hundreds of more milliseconds for network latency. If we're talking about each image adding 10 ms to the latency, then it's not worth it to add all this fancy asynchronous stuff.
The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows.
Moreover, in MediaWiki's case specifically, *very* few requests should actually require the thumbnailing. Only the first request for a given size of a given image should ever require thumbnailing: that can then be cached more or less forever.
That's true, we're already doing that.
So it's not a good case to optimize for.
AFAICT this isn't about optimization, it's about not bogging down the Apache that has the misfortune of getting the first request to thumb a huge image (but having a dedicated server for that instead), and about not letting the associated user wait for ages. Even worse, requests that thumb very large images could hit the 30s execution limit and fail, which means those thumbs will never be generated but every user requesting it will have a request last for 30s and time out.
Roan Kattouw (Catrope)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Apr 24, 2009 at 1:22 PM, Roan Kattouw roan.kattouw@gmail.com wrote:
The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows.
Is it really necessary for any image to take 10s to thumbnail? I guess this would only happen for very large images -- perhaps we could make sure to cache an intermediate-sized thumbnail as soon as the image is uploaded, and then scale that down synchronously on request, which should be fast. Similarly, if specific image features (progressive JPEG or whatever) make images much slower to thumbnail, an intermediate version can be automatically generated on upload without those features. Of course you'd see a little loss in quality from the double operation, but it seems like a more robust solution than trying to use JavaScript.
I'm not an expert on image formats, however, so maybe I'm misunderstanding our options.
AFAICT this isn't about optimization, it's about not bogging down the Apache that has the misfortune of getting the first request to thumb a huge image (but having a dedicated server for that instead), and about not letting the associated user wait for ages.
"Not letting the associated user wait for ages" is called "making it faster", which I'd say qualifies as optimization. :)
Even worse, requests that thumb very large images could hit the 30s execution limit and fail, which means those thumbs will never be generated but every user requesting it will have a request last for 30s and time out.
max_execution_time applies only to the time that PHP actually spends executing. If it's sleeping on a network request, it will never be killed for reaching the max execution time. Try running this code:
ini_set( 'max_execution_time', 5 ); error_reporting( E_ALL | E_STRICT ); ini_set( 'display_errors', 1 );
file_get_contents( 'http://toolserver.org/~simetrical/tmp/delay.php?len=10' );
echo "Fetched long URL!";
while ( true );
It will fetch the URL (which takes ten seconds), then only die after the while ( true ) runs for about five seconds. The same goes for long database queries, etc. I imagine it uses the OS's reports on user/system time used instead of real time.
Plus, the idea is apparently for this to not be done by the server at all, but by the client, so there will be no latency for the overall page request anyway. The page will load immediately, only the images will wait if there's any waiting to be done.
On Fri, Apr 24, 2009 at 1:46 PM, Brion Vibber brion@wikimedia.org wrote:
One suggestion that's been brought up for large images is to create a smaller version *once at upload time* which can then be used to quickly create inline thumbnails of various sizes on demand. But we still need some way to manage that asynchronous initial rendering, and have some kind of friendly behavior for what to show while it's working.
That's what occurred to me. In that case, the only possible thing to do seems to be to just have the image request wait until the image is thumbnailed. I guess you could show a placeholder image, but that's probably *less* friendly to the user, as long as we've specified the height and width in the HTML. The browser should provide some kind of placeholder already while the image is loading, after all, and if we let the browser provide the placeholder, then at least the image will appear automatically when it's done thumbnailing.
2009/4/24 Aryeh Gregor Simetrical+wikilist@gmail.com:
That's what occurred to me. In that case, the only possible thing to do seems to be to just have the image request wait until the image is thumbnailed. I guess you could show a placeholder image, but that's probably *less* friendly to the user, as long as we've specified the height and width in the HTML. The browser should provide some kind of placeholder already while the image is loading, after all, and if we let the browser provide the placeholder, then at least the image will appear automatically when it's done thumbnailing.
There was a spec in earlier versions of HTML to put a low-res thumbnail up while the full image dribbled through your dialup - <img lowsrc="image-placeholder.gif" src="image.gif"> - but it was so little used (I know of no cases) that I don't know if it's even supported in browsers any more.
http://www.htmlcodetutorial.com/images/_IMG_LOWSRC.html
- d.
Hoi, The library of Alexandria uses it for the display of their awesome Napoleontic lithographs.. It would be awesome if we had that code.. It is actually open source.. Thanks, Gerard
2009/4/24 David Gerard dgerard@gmail.com
2009/4/24 Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com
:
That's what occurred to me. In that case, the only possible thing to do seems to be to just have the image request wait until the image is thumbnailed. I guess you could show a placeholder image, but that's probably *less* friendly to the user, as long as we've specified the height and width in the HTML. The browser should provide some kind of placeholder already while the image is loading, after all, and if we let the browser provide the placeholder, then at least the image will appear automatically when it's done thumbnailing.
There was a spec in earlier versions of HTML to put a low-res thumbnail up while the full image dribbled through your dialup - <img lowsrc="image-placeholder.gif" src="image.gif"> - but it was so little used (I know of no cases) that I don't know if it's even supported in browsers any more.
http://www.htmlcodetutorial.com/images/_IMG_LOWSRC.html
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Apr 24, 2009 at 07:08:05PM +0100, David Gerard wrote:
There was a spec in earlier versions of HTML to put a low-res thumbnail up while the full image dribbled through your dialup - <img lowsrc="image-placeholder.gif" src="image.gif"> - but it was so little used (I know of no cases) that I don't know if it's even supported in browsers any more.
I tried it with FireFox 3.0.9 and IE 7.0.6001.18000; neither paid any attention to it. IE 6.0.2800.1106 under Wine also ignored it. Too bad, that would have been nice if it worked.
I don't know that we need fancy AJAX if we know at page rendering time whether the image is available, though. We might be able to get away with a simple script like this: var ImageCache={}; function loadImage(id, url){ var i = document.getElementById(id); if(i){ var img = new Image(); ImageCache[id] = img; img.onload=function(){ i.src = url; ImageCache[id]=null; }; img.src = url; } } And then generate the <img> tag with the placeholder and some id, and call that function onload for it. Of course, if there are a lot of these images on one page then we might run into the browser's concurrent connection limit, which an AJAX solution might be able to overcome.
Roan Kattouw wrote:
The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows.
yea again if we only issue the big resize operation on initial upload with a memory friendly in-place library like vips I think we will be oky. Since the user just waited like 10-15 minutes to upload their huge image waiting an additional 10-30s at that point for thumbnail and "instant gratification" of seeing your image on the upload page ... is not such a big deal. Then in-page use derivatives could predictably resize the 1024x786 ~or so~ image in realtime again instant gratification on page preview or page save.
Operationally this could go out to a thumbnail server or be done on the apaches if they are small operations it may be easier to keep the existing infrastructure than to intelligently handle the edge cases outlined. ( many resize request at once, placeholders, image proxy / deamon setup)
AFAICT this isn't about optimization, it's about not bogging down the Apache that has the misfortune of getting the first request to thumb a huge image (but having a dedicated server for that instead), and about not letting the associated user wait for ages. Even worse, requests that thumb very large images could hit the 30s execution limit and fail, which means those thumbs will never be generated but every user requesting it will have a request last for 30s and time out.
Again this may be related to the use of unpredictable memory usage of image-magic when resizing large images instead of a fast memory confined resize engine, no?
On 4/24/09 11:05 AM, Michael Dale wrote:
Roan Kattouw wrote:
The problem here seems to be that thumbnail generation times vary a lot, based on format and size of the original image. It could be 10 ms for one image and 10 s for another, who knows.
yea again if we only issue the big resize operation on initial upload with a memory friendly in-place library like vips I think we will be oky. Since the user just waited like 10-15 minutes to upload their huge image waiting an additional 10-30s at that point for thumbnail and "instant gratification" of seeing your image on the upload page ... is not such a big deal.
Well, what about the 5 million other users browsing Special:Newimages? We don't want 50 simultaneous attempts to build that first über-thumbnail. :)
-- brion
with a memory friendly in-place library like vips I think we will be oky. Since the user just waited like 10-15 minutes to upload their huge image waiting an additional 10-30s at that point for thumbnail and "instant gratification" of seeing your image on the upload page ... is not such a big deal.
Well, what about the 5 million other users browsing Special:Newimages? We don't want 50 simultaneous attempts to build that first über-thumbnail. :)
Thumbnail generation could be cascaded, i.e. 120px thumbs could be generated from the 800px previews instead of the original images.
On Fri, Apr 24, 2009 at 2:44 PM, Brion Vibber brion@wikimedia.org wrote:
Well, what about the 5 million other users browsing Special:Newimages? We don't want 50 simultaneous attempts to build that first über-thumbnail. :)
You'd presumably use some kind of locking to stop multiple workers from trying to render thumbnails of the same size in general (über-thumbnails or not).
On 4/24/09 12:07 PM, Aryeh Gregor wrote:
On Fri, Apr 24, 2009 at 2:44 PM, Brion Vibberbrion@wikimedia.org wrote:
Well, what about the 5 million other users browsing Special:Newimages? We don't want 50 simultaneous attempts to build that first über-thumbnail. :)
You'd presumably use some kind of locking to stop multiple workers from trying to render thumbnails of the same size in general (über-thumbnails or not).
Best to make it explicit rather than presume -- currently we have no such locking for slow resizing requests. :)
-- brion
On Fri, Apr 24, 2009 at 3:58 PM, Brion Vibber brion@wikimedia.org wrote:
Best to make it explicit rather than presume -- currently we have no such locking for slow resizing requests. :)
Yes, definitely.
Brion Vibber wrote:
yea again if we only issue the big resize operation on initial upload with a memory friendly in-place library like vips I think we will be oky. Since the user just waited like 10-15 minutes to upload their huge image waiting an additional 10-30s at that point for thumbnail and "instant gratification" of seeing your image on the upload page ... is not such a big deal.
Well, what about the 5 million other users browsing Special:Newimages? We don't want 50 simultaneous attempts to build that first über-thumbnail. :)
Right .. I am just saying the simples path is integrate it into the upload flow. ie The image won't be a known asset until that first uber-thumbnail is generated. Once that happens then its available for inclusion and listed on Newimages. The user won't notice that extra 10-15 second delay because it will be part of the uploading flow.
i.e the user is already waiting for their file to be uploaded a few extra seconds of server side processing integrated into that waiting won't be noticed that much and will be easier to integrate with the existing system. (instead of a new concept of "resource is being processed please wait"
We do eventually need a "this resource is being processed" concept but I don't know if its good project for the summer of code student to target.
--michael
Michael Dale wrote:
yea again if we only issue the big resize operation on initial upload with a memory friendly in-place library like vips I think we will be oky. Since the user just waited like 10-15 minutes to upload their huge image waiting an additional 10-30s at that point for thumbnail and "instant gratification" of seeing your image on the upload page ... is not such a big deal.
It can be parallelized, starting rendering the thumb while the file hasn't been completely uploaded yet (most formats will allow to do that). That'd need special software, the easiest would be to use a different domain on Special:Upload action to the resizing cluster. These changes are always an annoyance but it would ease many bugs: 10976, 16751, 18202, upload bar, non-NFS backend...
Also relevant: 17255 and 18201 And as this would be a new upload ssytem, also worth mentioning 18563 (new-upload branch)
On Tue, Apr 21, 2009 at 7:54 PM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop.
That seems to be orthogonal to the proposed project.
On Wed, Apr 22, 2009 at 12:54 AM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw roan.kattouw@gmail.comwrote:
- Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more
Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop.
On a semi-related note: What's the status of the management routines that handle "thrwoaway" things like math PNGs? Is this a generic system, so it can be used e.g. for jmol PNGs in the future? Is it integrated with the image thumbnail handling? Should it be?
Magnus
On 4/22/09 11:13 AM, Magnus Manske wrote:
On Wed, Apr 22, 2009 at 12:54 AM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouwroan.kattouw@gmail.comwrote:
- Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
thumbnailing daemon, so image manipulation won't have to happen on the Apache servers any more
Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper the ability to choose non-standard resizing filters or so... or full-fledged image manipulation, something like a wiki-style photoshop.
On a semi-related note: What's the status of the management routines that handle "thrwoaway" things like math PNGs?
There is no management for this yet, it's done ad-hoc in each such system. :(
Is this a generic system, so it can be used e.g. for jmol PNGs in the future? Is it integrated with the image thumbnail handling? Should it be?
We do need a central management system for this, which can handle:
1) Storage backends other than raw filesystem
We want to migrate off of using NFS to something we can better control failover and other characteristics of. Not having to implement the interface a second, third, fourth etc time for math, timeline, etc would be nice.
2) Garbage collection / expiration of no-longer-used items
Right now math and timeline renderings just get stored forever and ever...
3) Sensible purging/expiration/override of old renderings when renderer behavior changes
When we fix a bug in, upgrade, or expand capabilities of texvc etc we need to be able to re-render the new, corrected images. Preferably in a way that's friendly to caching, and that doesn't kill our servers with a giant immediate crush of requests.
4) Rendering server isolation
Being able to offload rendering to a subcluster with restricted resource limits can help avoid bringing down the entire site when there's a runaway process (like all those image resizing problems we've seen with giant PNGs and animated GIFs).
It may also help to do some privilege separation for services we might not trust quite as much (shelling out to an external program with user-supplied data? What could go wrong? :)
-- brion
I've created an initial proposal for a unified storage-handling database:
http://www.mediawiki.org/wiki/User:Magnus_Manske/File_handling
Feel free to edit and comment :-)
Cheers, Magnus
Thanks for taking care of the announce mail, Roan! I spent all day yesterday at the dentists... whee :P
I've taken the liberty of reposting it on the tech blog: http://techblog.wikimedia.org/2009/04/google-summer-of-code-student-projects...
I'd love for us to get the students set up on the blog to keep track of their project progress and raise visibility... :D
-- brion
On Thu, Apr 23, 2009 at 2:30 AM, Brion Vibber brion@wikimedia.org wrote:
Thanks for taking care of the announce mail, Roan! I spent all day yesterday at the dentists... whee :P
I've taken the liberty of reposting it on the tech blog: http://techblog.wikimedia.org/2009/04/google-summer-of-code-student-projects...
I'd love for us to get the students set up on the blog to keep track of their project progress and raise visibility... :D
-- brion
Maybe a nice little install of WordpressMU might be in order so they each have a little blog which they can update.
wikitech-l@lists.wikimedia.org