Audio file sizes

List overview All Threads
Download

newer

older

New SVN committer

template creation

Gregory Maxwell

6 Mar 2007 6 Mar '07

7:18 a.m.

First some background, why do we use compressed audio:

1) Uncompressed audio is huge and uses a lot of disk space for us. 2) Uncompressed audio would be an amazing waste of our bandwidth. 3) Uncompressed audio would be a huge burden on our readers, and would make clips longer than a few seconds inaccessible to users on dialup.

Brion may stab me for saying it, but (1) is fairly unimportant.

The way we are using compressed audio today is not very good from the perspective of (2), and (3).

I haven't gone and exhaustively checked our files, but most of the Ogg/Vorbis files I'm seeing people upload are in the 160kbit/sec range. I believe there are four reasons for this:

1) An incorrect impression created by old data about MP3 performance. Old MP3 wisdom is that you have to be 160kbit/sec to be good. This was true five years ago.

All cutting edge perceptual coders have results which are very good at 128kbit/sec, Vorbis better than the others, with results which are nearly statistically the same as the original on a test using trained listners.(http://www.listening-tests.info/mf-128-1/results.htm)

2) My recording is a unique and precious snowflake. I want it to be the highest quality.

3) You can go down but you can't go up, and I have broadband so it's fast enough for me.

4) Lossless files are bad for editing, less loss is better.

Of these, *none* of them should be a factor is deciding what we send to our readers. Yet here we are sending these 160kbit/sec oggs to joe-average-reader, sucking up his bandwidth and ours.

In terms of what we need uploaded for our own purposes, (4) matters. It matters a lot, because being able to edit the content is an important part of what we enable. However, even the 160kbit/sec lossy files fail here: Take a file decoded from a 160kbit/ogg and pass it through a 1500Hz high pass filter ('sox input.wav output.wav highp 1500' for those with real computers (tm)). The result will have obvious yucky artifacts. This isn't a bug, it's the behavior of a perceptual codec by design. The high pass trick is a pathological worst case, but it is true that even at 160kbit/sec perceptual codecs do not survive all forms of manipulation well.

What I'd like to see us do instead, is to ask uploaders to send us losslessly compressed Ogg/Flac files instead. Lossless compression because disk space isn't totally irrelevant and to avoid people downloading insanely huge wavs just because they don't want to use the java player or install a codec. We already permit uploading these lossless audio files they can be easily transcoded to Ogg/Vorbis while preserving all metadata.

Then we would, either via an automatic transcoding bot or via an upload enhancement to mediawiki automatically generate 40kbit/sec (dialup) and 96kbit/sec (broadband) versions from the lossless. These versions are what we'd link from our other projects.

Considering the quality of most audio/video sharing sites on the internet, these would still leave us head and shoulders better quality. (Most youtube videos seem to be using 8 or 16kbit/sec mp3 for their audio, it's miserable but you don't notice if you're watching the video)

Thoughts on this? On using a bot vs some native support? The biggest limit I see on this is that we don't have any support for subfiles, or bundled files... which means that this gunks up commons with two extra files for our audio.

Show replies by date

Gerard Meijssen

6 Mar 6 Mar

7:32 a.m.

Hoi, I have had contact with the people behind the "Praat" program. They are in favour of having support for a lossless format like Ogg/Flac. Certainly with the added benefit of having meta data with the files, it is even welcomed. They would love to have it included as an option in their program. Thanks, GerardM

Gregory Maxwell schreef:

...

First some background, why do we use compressed audio:

Uncompressed audio is huge and uses a lot of disk space for us.

Uncompressed audio would be an amazing waste of our bandwidth.

Uncompressed audio would be a huge burden on our readers, and would

make clips longer than a few seconds inaccessible to users on dialup.

Brion may stab me for saying it, but (1) is fairly unimportant.

The way we are using compressed audio today is not very good from the perspective of (2), and (3).

I haven't gone and exhaustively checked our files, but most of the Ogg/Vorbis files I'm seeing people upload are in the 160kbit/sec range. I believe there are four reasons for this:

An incorrect impression created by old data about MP3 performance.

Old MP3 wisdom is that you have to be 160kbit/sec to be good. This was true five years ago.

All cutting edge perceptual coders have results which are very good at 128kbit/sec, Vorbis better than the others, with results which are nearly statistically the same as the original on a test using trained listners.(http://www.listening-tests.info/mf-128-1/results.htm)

My recording is a unique and precious snowflake. I want it to be

the highest quality.

You can go down but you can't go up, and I have broadband so it's

fast enough for me.

Lossless files are bad for editing, less loss is better.

Of these, *none* of them should be a factor is deciding what we send to our readers. Yet here we are sending these 160kbit/sec oggs to joe-average-reader, sucking up his bandwidth and ours.

In terms of what we need uploaded for our own purposes, (4) matters. It matters a lot, because being able to edit the content is an important part of what we enable. However, even the 160kbit/sec lossy files fail here: Take a file decoded from a 160kbit/ogg and pass it through a 1500Hz high pass filter ('sox input.wav output.wav highp 1500' for those with real computers (tm)). The result will have obvious yucky artifacts. This isn't a bug, it's the behavior of a perceptual codec by design. The high pass trick is a pathological worst case, but it is true that even at 160kbit/sec perceptual codecs do not survive all forms of manipulation well.

What I'd like to see us do instead, is to ask uploaders to send us losslessly compressed Ogg/Flac files instead. Lossless compression because disk space isn't totally irrelevant and to avoid people downloading insanely huge wavs just because they don't want to use the java player or install a codec. We already permit uploading these lossless audio files they can be easily transcoded to Ogg/Vorbis while preserving all metadata.

Then we would, either via an automatic transcoding bot or via an upload enhancement to mediawiki automatically generate 40kbit/sec (dialup) and 96kbit/sec (broadband) versions from the lossless. These versions are what we'd link from our other projects.

Considering the quality of most audio/video sharing sites on the internet, these would still leave us head and shoulders better quality. (Most youtube videos seem to be using 8 or 16kbit/sec mp3 for their audio, it's miserable but you don't notice if you're watching the video)

Thoughts on this? On using a bot vs some native support? The biggest limit I see on this is that we don't have any support for subfiles, or bundled files... which means that this gunks up commons with two extra files for our audio.

Brion Vibber

9:40 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Gregory Maxwell wrote: [snip]

...

Thoughts on this? On using a bot vs some native support?

Definitely prefer native support, as with images.

The way we handle thumbnailing with arbitrary sizes kind of sucks, though: making only specific versions, and probably generating them on upload, would be most sensible.

- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF7ad2wRnhpk1wk44RAgAzAKDaw6zUjn9LjcojCxM5KXgTGbWTkgCgj5TF Yr0J3BJF215Yvrr8o4VoiHI= =nMtD -----END PGP SIGNATURE-----

Rob Church

9:42 a.m.

On 06/03/07, Brion Vibber brion@pobox.com wrote:

...

The way we handle thumbnailing with arbitrary sizes kind of sucks, though: making only specific versions, and probably generating them on upload, would be most sensible.

This model works better here because people requiring audio will more frequently fit into one of the groups we "cater" for, in terms of sizes available. Images arguably require a bit more flexibility in selection.

Rob Church

Jared Williams

5:05 p.m.

...

-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Brion Vibber Sent: 06 March 2007 17:40 To: Wikimedia developers Subject: Re: [Wikitech-l] Audio file sizes

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Gregory Maxwell wrote: [snip]

...
Thoughts on this? On using a bot vs some native support?

Definitely prefer native support, as with images.

The way we handle thumbnailing with arbitrary sizes kind of sucks, though: making only specific versions, and probably generating them on upload, would be most sensible.

The ogg vorbis file wrapper looks good.

http://devzone.zend.com/manual/view/page/oggvorbis.usage.html

Jared

Erik Moeller

11:59 a.m.

New subject: [Commons-l] Audio file sizes

On 3/6/07, Gregory Maxwell gmaxwell@gmail.com wrote:

...

Then we would, either via an automatic transcoding bot or via an upload enhancement to mediawiki automatically generate 40kbit/sec (dialup) and 96kbit/sec (broadband) versions from the lossless. These versions are what we'd link from our other projects.

According to your own reference, the threshold where most listeners cannot detect the difference between compressed and uncompressed is 128 kbit/s. This, then, should be the default quality threshold we serve, if we use lossless as a basis for conversion. I am opposed to accepting detectable quality degradation for marginal bandwidth savings. We're not YouTube; we should try to set a standard for video, audio and image quality.

-- Peace & Love, Erik DISCLAIMER: This message does not represent an official position of the Wikimedia Foundation or its Board of Trustees. "An old, rigid civilization is reluctantly dying. Something new, open, free and exciting is waking up." -- Ming the Mechanic

Edward Z. Yang

6:59 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Gregory Maxwell wrote:

...

What I'd like to see us do instead, is to ask uploaders to send us losslessly compressed Ogg/Flac files instead. Lossless compression because disk space isn't totally irrelevant and to avoid people downloading insanely huge wavs just because they don't want to use the java player or install a codec. We already permit uploading these lossless audio files they can be easily transcoded to Ogg/Vorbis while preserving all metadata. [snip]

I think this is a great idea and should definitely be investigated. However:

1. Encoding audio in a lossless codec seems, at least from my experience, to be a very CPU intensive process. WIth our equipment, this may or may not be a problem.

2. If we encourage losslessly compressed audio, we will probably want to increase our maximum upload size. Longer audio clips will easily exceed 20 MB, and Spoken Wikipedia usually *must* split up their audio files in order to upload them in (whether for better or worse). This also runs up with some of HTTPs limitations when it comes to file uploads. An anonymous FTP server would be pretty neat, but I doubt it's going to happen.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF7iqPqTO+fYacSNoRAnfgAJ9KbUjnqicz68Yo+hw4cMT2O0LmtACfT5Kr XKGozCqZEeb0/dv5vllcS/o= =L87N -----END PGP SIGNATURE-----

Gerard Meijssen

7 Mar 7 Mar

6:29 a.m.

Hoi, Using loss less audio is particularly relevant for the recordings of pronunciations. For ordinary use having them compressed in a lossy way is not a problem. The point is that with the Shtooka software that became available, we are in a position to easily create large amounts of sound files. These are of relevance to scientists and they have a need for loss less files. Given that the only identified reason to have loss less soundfiles ARE the pronunciations, the current upload size would suffice. They are in effect single words at most a sentence.

Thanks, GerardM

Edward Z. Yang schreef:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Gregory Maxwell wrote:

...
What I'd like to see us do instead, is to ask uploaders to send us losslessly compressed Ogg/Flac files instead. Lossless compression because disk space isn't totally irrelevant and to avoid people downloading insanely huge wavs just because they don't want to use the java player or install a codec. We already permit uploading these lossless audio files they can be easily transcoded to Ogg/Vorbis while preserving all metadata. [snip]

I think this is a great idea and should definitely be investigated. However:

Encoding audio in a lossless codec seems, at least from my

experience, to be a very CPU intensive process. WIth our equipment, this may or may not be a problem.

If we encourage losslessly compressed audio, we will probably want to

increase our maximum upload size. Longer audio clips will easily exceed 20 MB, and Spoken Wikipedia usually *must* split up their audio files in order to upload them in (whether for better or worse). This also runs up with some of HTTPs limitations when it comes to file uploads. An anonymous FTP server would be pretty neat, but I doubt it's going to happen.

Delirium

8 Mar 8 Mar

10:21 a.m.

Edward Z. Yang wrote:

...

Encoding audio in a lossless codec seems, at least from my

experience, to be a very CPU intensive process. WIth our equipment, this may or may not be a problem.

It might depend on the codec, but FLAC for one is much *less* CPU-intensive to encode than most commonly used lossy codecs, such as MP3. Since it doesn't have to do all the frequency-domain psychoacoustic analysis that lossy codecs do, it's a much simpler algorithm, both to implement and to execute.

On the decoding side things are a bit more even. I believe FLAC is still a bit faster, but it requires more memory bandwidth (sometimes an issue on embedded devices), simply because there are more megabytes per minute of audio that it needs to load. And of course it requires more storage on the device. But we could always transcode from FLAC to Vorbis on the server side to serve such clients.

-Mark

Edward Z. Yang

12 Mar 12 Mar

5:49 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Delirium wrote:

...

It might depend on the codec, but FLAC for one is much *less* CPU-intensive to encode than most commonly used lossy codecs, such as MP3. Since it doesn't have to do all the frequency-domain psychoacoustic analysis that lossy codecs do, it's a much simpler algorithm, both to implement and to execute.

I misspoke. When we're serving the audio files, the computation processing the Wikimedia foundation would have to do is re-encode them in a lossy codec such as OGG, to save bandwidth on both ends. Our storage format would be FLAC for disk space reasons, as WAVs can get big pretty fast.

I'd hate to see this discussion die out without some sort of judgment from the developers. Brion, do you think this sort of implementation is possible? Perhaps if we had a dedicated cluster for performing the audio encoding, and queued the stuff up so that while the lossless version wouldn't be available immediately, it would show up eventually?

Also, regarding anonymous FTP, I believe SourceForge has a similar setup for uploading release packages, so I think if we made it write-only (i.e. the list directory command returns no entries), only allowed established users to upload, and did regular housekeeping, it might be viable.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF9fUXqTO+fYacSNoRAiH+AJ9JHLGeSXJnmxQ11g/SfOxHvAkKywCfctRX fbW5yktWxy3prKvFKCL6Iv0= =h5WC -----END PGP SIGNATURE-----

Brion Vibber

13 Mar 13 Mar

7:03 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Edward Z. Yang wrote:

...

I'd hate to see this discussion die out without some sort of judgment from the developers. Brion, do you think this sort of implementation is possible? Perhaps if we had a dedicated cluster for performing the audio encoding, and queued the stuff up so that while the lossless version wouldn't be available immediately, it would show up eventually?

Yep!

I think it's a fantastic idea, both for audio and video.

I've stuck this on the list of suggested projects for the Google Summer of Code this year. (We've applied as a mentor organization, but won't know until tomorrow? for sure if we're accepted.)

http://meta.wikimedia.org/wiki/Summer_of_Code_2007

^^ feel free to add/remove/tweak/suggest

...

Also, regarding anonymous FTP, I believe SourceForge has a similar setup for uploading release packages, so I think if we made it write-only (i.e. the list directory command returns no entries), only allowed established users to upload, and did regular housekeeping, it might be viable.

FTP gives me the willies, personally. :) But perhaps we could arrange something.

- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF9q8vwRnhpk1wk44RAv2tAKCLOS7sagolrvKsRnWlUkAy/IVeogCfdRdG EgvjTRJzSxbLiwyg1uGkU5c= =GKp5 -----END PGP SIGNATURE-----

6315

Age (days ago)

6322

Last active (days ago)

wikitech-l@lists.wikimedia.org

10 comments

8 participants

tags (0)

participants (8)

Brion Vibber
Delirium
Edward Z. Yang
Erik Moeller
Gerard Meijssen
Gregory Maxwell
Jared Williams
Rob Church