On Mon, Aug 3, 2009 at 10:56 PM, Michael Dalemdale@wikimedia.org wrote:
Also will hack in adding derivatives to the job queue where oggHandler is embed in a wiki-article at a substantial lower resolution than the source version. Will have it send the high res version until the derivative is created then "purge" the pages to point to the new location. Will try and have the "download" link still point to the high res version. (we will only create one or two derivatives... also we should decide if we want an ultra low bitrate (200kbs or so version for people accessing Wikimedia on slow / developing country connections)
[snip]
So I think there should generally be three versions, a 'very low rate' suitable for streaming for people without excellent broadband, a high rate suitable for streaming on good broadband, and a 'download' copy at full resolution and very high rate. (The download copy would be the file uploaded by the user if they uploaded an Ogg)
As a matter of principle we should try to achieve both "very high quality" and "works for as many people as possible". I don't think we need to achieve both with one file, so the high and low rate files could specialize in those areas.
The suitable for streaming versions should have a limited instantaneous bitrate (non-infinite buf-delay). This sucks for quality but it's needed if we want streams that don't stall, because video can easily have >50:1 peak to average rates over fairly short time-spans. (It's also part of the secret sauce that differentiates smoothly working video from stuff that only works on uber-broadband).
Based on 'what other people do' I'd say the low should be in the 200kbit-300kbit/sec range. Perhaps taking the high up to a megabit?
There are also a lot of very short videos on Wikipedia where the whole thing could reasonably be buffered prior to playback.
Something I don't have an answer for is what resolutions to use. The low should fit on mobile device screens. Normally I'd suggest setting the size based on the content: Low motion detail oriented video should get higher resolutions than high motion scenes without important details. Doubling the number of derivatives in order to have a large and small setting on a per article basis is probably not acceptable. :(
For example— for this (http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv) low motion video 150kbit/sec results in perfectly acceptable quality at a fairly high resolution, while this (http://people.xiph.org/~greg/video/crew_cif_150.ogv) high motion clip looks like complete crap at 150kbit/sec even though it has 25% fewer pixels. For that target rate rhe second clip is much more useful when downsampled: http://people.xiph.org/~greg/video/crew_128_150.ogv yet if the first video were downsampled like that it would be totally useless as you couldn't read any of the slides. I have no clue how to solve this. I don't think the correct behavior could be automatically detected and if we tried we'd just piss off the users.
As an aside— downsampled video needs some makeup sharpening like downsampled stills will. I'll work on getting something in ffmpeg2theora to do this.
There is also the option of decimating the frame-rate. Going from 30fps to 15fps can make a decent improvement for bitrate vs visual quality but it can make some kinds of video look jerky. (Dropping the frame rate would also be helpful for any CPU starved devices)
Something to think of when designing this is that it would be really good to keep track of the encoder version and settings used to produce each derivative, so that files can be regenerated when the preferred settings change or the encoder is improved. It would also make it possible to do quick one-pass transcodes for the rate controlled streams and have the transcoders go back during idle time and produce better two-pass encodes.
This brings me to an interesting point about instant gratification: Ogg was intended from day one to be a streaming format. This has pluses and minuses, but one thing we should take advantage of is that it's completely valid and well supported by most software to start playing a file *as soon* as the encoder has started writing it. (If software can't handle this it also can't handle icecast streams). This means that so long as the transcode process is at least realtime the transcodes could be immediately available. This would, however, require that the derivative(s) be written to an accessible location. (and you will likely have to arrange so that a content-length: is not sent for the incomplete file).