On Sat, Aug 1, 2009 at 2:54 AM, BrianBrian.Mingus@colorado.edu wrote:
On Sat, Aug 1, 2009 at 12:47 AM, Gregory Maxwell gmaxwell@gmail.com wrote:
On Sat, Aug 1, 2009 at 12:13 AM, Michael Dalemdale@wikimedia.org wrote: Once you factor in the ratio of video to non-video content for the for-seeable future this comes off looking like a time wasting boondoggle.
I think you vastly underestimate the amount of video that will be uploaded. Michael is right in thinking big and thinking distributed. CPU cycles are not *that* cheap.
Really rough back of the napkin numbers:
My desktop has a X3360 CPU. You can build systems all day using this processor for $600 (I think I spent $500 on it 6 months ago). There are processors with better price/performance available now, but I can benchmark on this.
Commons is getting roughly 172076 uploads per month now across all media types. Scans of single pages, photographs copied from flickr, audio pronouncations, videos, etc.
If everyone switched to uploading 15 minute long SD videos instead of other things there would be 154,868,400 seconds of video uploaded to commons per-month. Truly a staggering amount. Assuming a 40 hour work week it would take over 250 people working full time just to *view* all of it.
That number is an average rate of 58.9 seconds of video uploaded per second every second of the month.
Using all four cores my desktop video encodes at >16x real-time (for moderate motion standard def input using the latest theora 1.1 svn).
So you'd need less than four of those systems to keep up with the entire commons upload rate switched to 15 minute videos. Okay, it would be slow at peak hours and you might wish to produce a couple of versions at different resolutions, so multiply that by a couple.
This is what I meant by processing being cheap.
If the uploads were all compressed at a bitrate of 4mbit/sec and that users were kind enough to spread their uploads out through the day and that the distributed system were perfectly efficient (only need to send one copy of the upload out), and if Wikimedia were only paying $10/mbit/sec/month for transit out of their primary dataceter... we'd find that the bandwidth costs of sending that source material out again would be $2356/month. (58.9 seconds per second * 4mbit/sec * $10/mbit/sec/month)
(Since transit billing is on the 95th percentile 5 minute average of the greater of inbound or outbound uploads are basically free, but sending out data to the 'cloud' costs like anything else).
So under these assumptions sending out compressed video for re-encoding is likely to cost roughly as much *each month* as the hardware for local transcoding. ... and the pace of processing speed up seems to be significantly better than the declining prices for bandwidth.
This is also what I meant by processing being cheap.
Because uploads won't be uniformly space you'll need some extra resources to keep things from getting bogged at peak hours. But the poor peak-to-average ratio also works against the bandwidth costs. You can't win: Unless you assume that uploads are going to be very low bitrates local transcoding will always be cheaper with very short payoff times.
I don't know how to figure out how much it would 'cost' to have human contributors spot embedded penises snuck into transcodes and then figure out which of several contributing transcoders are doing it and blocking them, only to have the bad user switch IPs and begin again. ... but it seems impossibly expensive even though it's not an actual dollar cost.
There is a lot of free video out there and as soon as we have a stable system in place wikimedians are going to have a heyday uploading it to Commons.
I'm not saying that there won't be video; I'm saying there won't be video if development time is spent on fanciful features rather than desperately needed short term functionality. We have tens of thousands of videos, much of which don't stream well for most people because they need thumbnailing.
Firefogg was useful upload lubrication. But user-powered cloud transcoding? I believe the analysis I provided above demonstrates that resources would be better applied elsewhere.