[Commons-l] Audio file sizes

Gregory Maxwell gmaxwell at gmail.com
Tue Mar 6 15:18:56 UTC 2007


First some background, why do we use compressed audio:

1) Uncompressed audio is huge and uses a lot of disk space for us.
2) Uncompressed audio would be an amazing waste of our bandwidth.
3) Uncompressed audio would be a huge burden on our readers, and would
make clips longer than a few seconds inaccessible to users on dialup.

Brion may stab me for saying it, but (1) is fairly unimportant.

The way we are using compressed audio today is not very good from the
perspective of (2), and (3).

I haven't gone and exhaustively checked our files, but most of the
Ogg/Vorbis files I'm seeing people upload are in the 160kbit/sec
range. I believe there are four reasons for this:

1) An incorrect impression created by old data about MP3 performance.
Old MP3 wisdom is that you have to be 160kbit/sec to be good. This was
true five years ago.

All cutting edge perceptual coders have results which are very good at
128kbit/sec, Vorbis better than the others, with results which are
nearly statistically the same as the original on a test using trained
listners.(http://www.listening-tests.info/mf-128-1/results.htm)

2) My recording is a unique and precious snowflake. I want it to be
the highest quality.

3) You can go down but you can't go up, and I have broadband so it's
fast enough for me.

4) Lossless files are bad for editing, less loss is better.


Of these, *none* of them should be a factor is deciding what we send
to our readers.  Yet here we are sending these 160kbit/sec oggs to
joe-average-reader, sucking up his bandwidth and ours.

In terms of what we need uploaded for our own purposes, (4) matters.
It matters a lot, because being able to edit the content is an
important part of what we enable. However,  even the 160kbit/sec lossy
files fail here: Take a file decoded from a 160kbit/ogg and pass it
through a 1500Hz high pass filter ('sox input.wav output.wav highp
1500'  for those with real computers (tm)). The result will have
obvious yucky artifacts. This isn't a bug, it's the behavior of a
perceptual codec by design. The high pass trick is a pathological
worst case, but it is true that even at 160kbit/sec perceptual codecs
do not survive all forms of manipulation well.


What I'd like to see us do instead, is to ask uploaders to send us
losslessly compressed Ogg/Flac files instead. Lossless compression
because disk space isn't totally irrelevant and to avoid people
downloading insanely huge wavs just because they don't want to use the
java player or install a codec.  We already permit uploading these
lossless audio files they can be easily transcoded to Ogg/Vorbis while
preserving all metadata.

Then we would, either via an automatic transcoding bot or via an
upload enhancement to mediawiki automatically generate 40kbit/sec
(dialup) and 96kbit/sec (broadband)  versions from the lossless. These
versions are what we'd link from our other projects.

Considering the quality of most audio/video sharing sites on the
internet, these would still leave us head and shoulders better
quality. (Most youtube videos seem to be using 8 or 16kbit/sec mp3 for
their audio, it's miserable but you don't notice if you're watching
the video)

Thoughts on this?  On using a bot vs some native support?  The biggest
limit I see on this is that we don't have any support for subfiles, or
bundled files... which means that this gunks up commons with two extra
files for our audio.



More information about the Commons-l mailing list