First some background, why do we use compressed audio:
1) Uncompressed audio is huge and uses a lot of disk space for us. 2) Uncompressed audio would be an amazing waste of our bandwidth. 3) Uncompressed audio would be a huge burden on our readers, and would make clips longer than a few seconds inaccessible to users on dialup.
Brion may stab me for saying it, but (1) is fairly unimportant.
The way we are using compressed audio today is not very good from the perspective of (2), and (3).
I haven't gone and exhaustively checked our files, but most of the Ogg/Vorbis files I'm seeing people upload are in the 160kbit/sec range. I believe there are four reasons for this:
1) An incorrect impression created by old data about MP3 performance. Old MP3 wisdom is that you have to be 160kbit/sec to be good. This was true five years ago.
All cutting edge perceptual coders have results which are very good at 128kbit/sec, Vorbis better than the others, with results which are nearly statistically the same as the original on a test using trained listners.(http://www.listening-tests.info/mf-128-1/results.htm)
2) My recording is a unique and precious snowflake. I want it to be the highest quality.
3) You can go down but you can't go up, and I have broadband so it's fast enough for me.
4) Lossless files are bad for editing, less loss is better.
Of these, *none* of them should be a factor is deciding what we send to our readers. Yet here we are sending these 160kbit/sec oggs to joe-average-reader, sucking up his bandwidth and ours.
In terms of what we need uploaded for our own purposes, (4) matters. It matters a lot, because being able to edit the content is an important part of what we enable. However, even the 160kbit/sec lossy files fail here: Take a file decoded from a 160kbit/ogg and pass it through a 1500Hz high pass filter ('sox input.wav output.wav highp 1500' for those with real computers (tm)). The result will have obvious yucky artifacts. This isn't a bug, it's the behavior of a perceptual codec by design. The high pass trick is a pathological worst case, but it is true that even at 160kbit/sec perceptual codecs do not survive all forms of manipulation well.
What I'd like to see us do instead, is to ask uploaders to send us losslessly compressed Ogg/Flac files instead. Lossless compression because disk space isn't totally irrelevant and to avoid people downloading insanely huge wavs just because they don't want to use the java player or install a codec. We already permit uploading these lossless audio files they can be easily transcoded to Ogg/Vorbis while preserving all metadata.
Then we would, either via an automatic transcoding bot or via an upload enhancement to mediawiki automatically generate 40kbit/sec (dialup) and 96kbit/sec (broadband) versions from the lossless. These versions are what we'd link from our other projects.
Considering the quality of most audio/video sharing sites on the internet, these would still leave us head and shoulders better quality. (Most youtube videos seem to be using 8 or 16kbit/sec mp3 for their audio, it's miserable but you don't notice if you're watching the video)
Thoughts on this? On using a bot vs some native support? The biggest limit I see on this is that we don't have any support for subfiles, or bundled files... which means that this gunks up commons with two extra files for our audio.
On 06/03/07, Gregory Maxwell gmaxwell@gmail.com wrote:
Then we would, either via an automatic transcoding bot or via an upload enhancement to mediawiki automatically generate 40kbit/sec (dialup) and 96kbit/sec (broadband) versions from the lossless. These versions are what we'd link from our other projects.
I understand Ogg Vorbis was designed such that you can create a 96kbit/s frame by taking a 160kbit/s frame and stripping the end off.
- d.
On 3/6/07, David Gerard dgerard@gmail.com wrote:
On 06/03/07, Gregory Maxwell gmaxwell@gmail.com wrote:
Then we would, either via an automatic transcoding bot or via an upload enhancement to mediawiki automatically generate 40kbit/sec (dialup) and 96kbit/sec (broadband) versions from the lossless. These versions are what we'd link from our other projects.
I understand Ogg Vorbis was designed such that you can create a 96kbit/s frame by taking a 160kbit/s frame and stripping the end off.
It was designed that way, yes. However, various deep technical issue causes such simplistic conversion to sound bad. As a result a more sophisticated bitrate peeler is required, and no one has gotten around to coding one.
On my laptop, doing oggenc -q -1 flacfile is about 45x realtime. .. it's not something we would want to do on the fly for every load, but cached or at upload time it's not bad.
Hoi, I have had contact with the people behind the "Praat" program. They are in favour of having support for a lossless format like Ogg/Flac. Certainly with the added benefit of having meta data with the files, it is even welcomed. They would love to have it included as an option in their program. Thanks, GerardM
Gregory Maxwell schreef:
First some background, why do we use compressed audio:
- Uncompressed audio is huge and uses a lot of disk space for us.
- Uncompressed audio would be an amazing waste of our bandwidth.
- Uncompressed audio would be a huge burden on our readers, and would
make clips longer than a few seconds inaccessible to users on dialup.
Brion may stab me for saying it, but (1) is fairly unimportant.
The way we are using compressed audio today is not very good from the perspective of (2), and (3).
I haven't gone and exhaustively checked our files, but most of the Ogg/Vorbis files I'm seeing people upload are in the 160kbit/sec range. I believe there are four reasons for this:
- An incorrect impression created by old data about MP3 performance.
Old MP3 wisdom is that you have to be 160kbit/sec to be good. This was true five years ago.
All cutting edge perceptual coders have results which are very good at 128kbit/sec, Vorbis better than the others, with results which are nearly statistically the same as the original on a test using trained listners.(http://www.listening-tests.info/mf-128-1/results.htm)
- My recording is a unique and precious snowflake. I want it to be
the highest quality.
- You can go down but you can't go up, and I have broadband so it's
fast enough for me.
- Lossless files are bad for editing, less loss is better.
Of these, *none* of them should be a factor is deciding what we send to our readers. Yet here we are sending these 160kbit/sec oggs to joe-average-reader, sucking up his bandwidth and ours.
In terms of what we need uploaded for our own purposes, (4) matters. It matters a lot, because being able to edit the content is an important part of what we enable. However, even the 160kbit/sec lossy files fail here: Take a file decoded from a 160kbit/ogg and pass it through a 1500Hz high pass filter ('sox input.wav output.wav highp 1500' for those with real computers (tm)). The result will have obvious yucky artifacts. This isn't a bug, it's the behavior of a perceptual codec by design. The high pass trick is a pathological worst case, but it is true that even at 160kbit/sec perceptual codecs do not survive all forms of manipulation well.
What I'd like to see us do instead, is to ask uploaders to send us losslessly compressed Ogg/Flac files instead. Lossless compression because disk space isn't totally irrelevant and to avoid people downloading insanely huge wavs just because they don't want to use the java player or install a codec. We already permit uploading these lossless audio files they can be easily transcoded to Ogg/Vorbis while preserving all metadata.
Then we would, either via an automatic transcoding bot or via an upload enhancement to mediawiki automatically generate 40kbit/sec (dialup) and 96kbit/sec (broadband) versions from the lossless. These versions are what we'd link from our other projects.
Considering the quality of most audio/video sharing sites on the internet, these would still leave us head and shoulders better quality. (Most youtube videos seem to be using 8 or 16kbit/sec mp3 for their audio, it's miserable but you don't notice if you're watching the video)
Thoughts on this? On using a bot vs some native support? The biggest limit I see on this is that we don't have any support for subfiles, or bundled files... which means that this gunks up commons with two extra files for our audio.
On 3/6/07, Gregory Maxwell gmaxwell@gmail.com wrote:
Then we would, either via an automatic transcoding bot or via an upload enhancement to mediawiki automatically generate 40kbit/sec (dialup) and 96kbit/sec (broadband) versions from the lossless. These versions are what we'd link from our other projects.
According to your own reference, the threshold where most listeners cannot detect the difference between compressed and uncompressed is 128 kbit/s. This, then, should be the default quality threshold we serve, if we use lossless as a basis for conversion. I am opposed to accepting detectable quality degradation for marginal bandwidth savings. We're not YouTube; we should try to set a standard for video, audio and image quality.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Gregory Maxwell wrote:
What I'd like to see us do instead, is to ask uploaders to send us losslessly compressed Ogg/Flac files instead. Lossless compression because disk space isn't totally irrelevant and to avoid people downloading insanely huge wavs just because they don't want to use the java player or install a codec. We already permit uploading these lossless audio files they can be easily transcoded to Ogg/Vorbis while preserving all metadata. [snip]
I think this is a great idea and should definitely be investigated. However:
1. Encoding audio in a lossless codec seems, at least from my experience, to be a very CPU intensive process. WIth our equipment, this may or may not be a problem.
2. If we encourage losslessly compressed audio, we will probably want to increase our maximum upload size. Longer audio clips will easily exceed 20 MB, and Spoken Wikipedia usually *must* split up their audio files in order to upload them in (whether for better or worse). This also runs up with some of HTTPs limitations when it comes to file uploads. An anonymous FTP server would be pretty neat, but I doubt it's going to happen.
Hoi, Using loss less audio is particularly relevant for the recordings of pronunciations. For ordinary use having them compressed in a lossy way is not a problem. The point is that with the Shtooka software that became available, we are in a position to easily create large amounts of sound files. These are of relevance to scientists and they have a need for loss less files. Given that the only identified reason to have loss less soundfiles ARE the pronunciations, the current upload size would suffice. They are in effect single words at most a sentence.
Thanks, GerardM
Edward Z. Yang schreef:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Gregory Maxwell wrote:
What I'd like to see us do instead, is to ask uploaders to send us losslessly compressed Ogg/Flac files instead. Lossless compression because disk space isn't totally irrelevant and to avoid people downloading insanely huge wavs just because they don't want to use the java player or install a codec. We already permit uploading these lossless audio files they can be easily transcoded to Ogg/Vorbis while preserving all metadata. [snip]
I think this is a great idea and should definitely be investigated. However:
- Encoding audio in a lossless codec seems, at least from my
experience, to be a very CPU intensive process. WIth our equipment, this may or may not be a problem.
- If we encourage losslessly compressed audio, we will probably want to
increase our maximum upload size. Longer audio clips will easily exceed 20 MB, and Spoken Wikipedia usually *must* split up their audio files in order to upload them in (whether for better or worse). This also runs up with some of HTTPs limitations when it comes to file uploads. An anonymous FTP server would be pretty neat, but I doubt it's going to happen.
On 3/6/07, Edward Z. Yang edwardzyang@thewritingpot.com wrote:
I think this is a great idea and should definitely be investigated. However:
- Encoding audio in a lossless codec seems, at least from my
experience, to be a very CPU intensive process. WIth our equipment, this may or may not be a problem.
Ogg/Flac is very fast, especially on the decode side. My laptop can convert a flac to an Ogg/Vorbis file at 45x real time. Given your point below and my cpu speed, which I hadn't thought of.. I wouldn't expect any transcode to take more than 10 seconds. ;) But, that will have to be fixed.
At some point we'll want to transcode video and for all I could argue that transcoding audio is fairly cheap, transcoding video is far less...
- If we encourage losslessly compressed audio, we will probably want to
increase our maximum upload size. Longer audio clips will easily exceed 20 MB, and Spoken Wikipedia usually *must* split up their audio files in order to upload them in (whether for better or worse). This also runs up with some of HTTPs limitations when it comes to file uploads. An anonymous FTP server would be pretty neat, but I doubt it's going to happen.
Ugh. This is a point that we'll have to address. This is an issue without lossless upload but more of one with it.