In Wiktionary, it's very convenient that some words have sound illustrations, e.g. http://en.wiktionary.org/wiki/go%C3%BBter
These audio bites are simple 2-3 second OGG files, e.g. http://commons.wikimedia.org/wiki/File:Fr-go%C3%BBter.ogg
but they are limited in number. It would be very easy to record more of them, but before you get started it takes some time to learn the details, and then you need to upload to Commons and specify a license, and provide a description, ... It's not very likely that the person who does all that is also a good voice in each desired language.
Here's a better plan:
Provide a tool on the toolserver, or any other server, having a simple link syntax that specifies the language code and the text, e.g. http://toolserver.org/mytool.php?lang=fr&text=gouter
The tool uses a cookie, that remembers that this user has agreed to submit contributions using cc0. At the first visit, this question is asked as a click-through license.
The user is now prompted with the text (from the URL) and recording starts when pressing a button. The user says the word, and presses the button again. The tool saves the OGG sound, uploads it to Commons with the filename fr-gouter-XYZ789.ogg and the cc0 declaration and all metadata, placing it in a category of recorded but unverified words.
Another user can record the same word, and it will be given another random letter-digit code.
As a separate part of the tool, other volunteers are asked to verify or rate (1 to 5 stars) the recordings available in a given language. The rating is stored as categories on commons.
Now, a separate procedure (manual or a bot job) can pick words that need new or improved recordings, and list them (with links to the tool) on a normal wiki page.
I know HTML supports uploading of a file, but I don't know how to solve the recording of sound directly to a web service. Perhaps this could be a Skype application? I have no idea. Please just be creative. It should be solvable, because this is 2013 and not 2003.