Some notes: * ~its mostly an api~. We can run it internally if that is more cost efficient. ( will do on a command line client shortly ) ... (as mentioned earlier the present code was hacked together quickly its just a prototype. I will generalize things to work better as internal jobs. and I think I will not create File:Myvideo.mp4 wiki pages rather create a placeholder File:Myvideo.ogg page and only store the derivatives outside of wiki page node system. I also notice some sync issues with oggCat which are under investigation )
* Clearly CPU's, are cheep so is power for the commuters, human resources for system maintenance, rack-space and internal network management, and we of-course will want to "run the numbers" on any solution we go with. I think your source bitrate assumption was a little high I would think more like 1-2Mbs (with cell-phone camaras targeting low bitrates for transport and desktops re-encoding before upload). But I think this whole convesation is missing the larget issue which is if its cost prohibitive to distribute a few copies for transcode how are we going to distribute the derivatives thousands of times for viewing? Perhaps future work in this area should focus more on the distributing bandwith cost issue.
* Furthermore I think I might have mis-represented wiki@home I should have more clearly focused on the sequence flattening and only mentioned transocding as an option. With sequence flattening we have a more standard viewing bitrate of source material and cpu costs for rendering are much higher. At present there is no fast way to overlay html/svg on video with filters and effects that are only presently predictably defined in javascript. For this reason we use the browser to wysiwyg render out the content. Eventually we may want to write a optimized stand alone flattener, but for now the wiki@home solution worlds less costly in terms of developer resources since we can use the "editor" to output the flat file.
3) And finally yes ... you can already insert a penis into video uploads today. With something like: oggCat | "ffmpeg2theora -i someVideo.ogg -s 0 -e 42.2" "myOneFramePenis.ogg" "ffmpeg2theora -i someVideo.ogg -s 42.2" But yea its one more level to worry about and if its cheaper to do it internally (the transcodes not the penis insertion) we should do it internally. :P (I hope other appreciate the multiple levels of humor here)
peace, michael
Gregory Maxwell wrote:
On Sat, Aug 1, 2009 at 2:54 AM, BrianBrian.Mingus@colorado.edu wrote:
On Sat, Aug 1, 2009 at 12:47 AM, Gregory Maxwell gmaxwell@gmail.com wrote:
On Sat, Aug 1, 2009 at 12:13 AM, Michael Dalemdale@wikimedia.org wrote: Once you factor in the ratio of video to non-video content for the for-seeable future this comes off looking like a time wasting boondoggle.
I think you vastly underestimate the amount of video that will be uploaded. Michael is right in thinking big and thinking distributed. CPU cycles are not *that* cheap.
Really rough back of the napkin numbers:
My desktop has a X3360 CPU. You can build systems all day using this processor for $600 (I think I spent $500 on it 6 months ago). There are processors with better price/performance available now, but I can benchmark on this.
Commons is getting roughly 172076 uploads per month now across all media types. Scans of single pages, photographs copied from flickr, audio pronouncations, videos, etc.
If everyone switched to uploading 15 minute long SD videos instead of other things there would be 154,868,400 seconds of video uploaded to commons per-month. Truly a staggering amount. Assuming a 40 hour work week it would take over 250 people working full time just to *view* all of it.
That number is an average rate of 58.9 seconds of video uploaded per second every second of the month.
Using all four cores my desktop video encodes at >16x real-time (for moderate motion standard def input using the latest theora 1.1 svn).
So you'd need less than four of those systems to keep up with the entire commons upload rate switched to 15 minute videos. Okay, it would be slow at peak hours and you might wish to produce a couple of versions at different resolutions, so multiply that by a couple.
This is what I meant by processing being cheap.
If the uploads were all compressed at a bitrate of 4mbit/sec and that users were kind enough to spread their uploads out through the day and that the distributed system were perfectly efficient (only need to send one copy of the upload out), and if Wikimedia were only paying $10/mbit/sec/month for transit out of their primary dataceter... we'd find that the bandwidth costs of sending that source material out again would be $2356/month. (58.9 seconds per second * 4mbit/sec * $10/mbit/sec/month)
(Since transit billing is on the 95th percentile 5 minute average of the greater of inbound or outbound uploads are basically free, but sending out data to the 'cloud' costs like anything else).
So under these assumptions sending out compressed video for re-encoding is likely to cost roughly as much *each month* as the hardware for local transcoding. ... and the pace of processing speed up seems to be significantly better than the declining prices for bandwidth.
This is also what I meant by processing being cheap.
Because uploads won't be uniformly space you'll need some extra resources to keep things from getting bogged at peak hours. But the poor peak-to-average ratio also works against the bandwidth costs. You can't win: Unless you assume that uploads are going to be very low bitrates local transcoding will always be cheaper with very short payoff times.
I don't know how to figure out how much it would 'cost' to have human contributors spot embedded penises snuck into transcodes and then figure out which of several contributing transcoders are doing it and blocking them, only to have the bad user switch IPs and begin again. ... but it seems impossibly expensive even though it's not an actual dollar cost.
There is a lot of free video out there and as soon as we have a stable system in place wikimedians are going to have a heyday uploading it to Commons.
I'm not saying that there won't be video; I'm saying there won't be video if development time is spent on fanciful features rather than desperately needed short term functionality. We have tens of thousands of videos, much of which don't stream well for most people because they need thumbnailing.
Firefogg was useful upload lubrication. But user-powered cloud transcoding? I believe the analysis I provided above demonstrates that resources would be better applied elsewhere.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l