In Wiktionary, it's very convenient that some words have sound illustrations, e.g. http://en.wiktionary.org/wiki/go%C3%BBter
These audio bites are simple 2-3 second OGG files, e.g. http://commons.wikimedia.org/wiki/File:Fr-go%C3%BBter.ogg
but they are limited in number. It would be very easy to record more of them, but before you get started it takes some time to learn the details, and then you need to upload to Commons and specify a license, and provide a description, ... It's not very likely that the person who does all that is also a good voice in each desired language.
Here's a better plan:
Provide a tool on the toolserver, or any other server, having a simple link syntax that specifies the language code and the text, e.g. http://toolserver.org/mytool.php?lang=fr&text=gouter
The tool uses a cookie, that remembers that this user has agreed to submit contributions using cc0. At the first visit, this question is asked as a click-through license.
The user is now prompted with the text (from the URL) and recording starts when pressing a button. The user says the word, and presses the button again. The tool saves the OGG sound, uploads it to Commons with the filename fr-gouter-XYZ789.ogg and the cc0 declaration and all metadata, placing it in a category of recorded but unverified words.
Another user can record the same word, and it will be given another random letter-digit code.
As a separate part of the tool, other volunteers are asked to verify or rate (1 to 5 stars) the recordings available in a given language. The rating is stored as categories on commons.
Now, a separate procedure (manual or a bot job) can pick words that need new or improved recordings, and list them (with links to the tool) on a normal wiki page.
I know HTML supports uploading of a file, but I don't know how to solve the recording of sound directly to a web service. Perhaps this could be a Skype application? I have no idea. Please just be creative. It should be solvable, because this is 2013 and not 2003.
On 3/12/13, Lars Aronsson lars@aronsson.se wrote:
In Wiktionary, it's very convenient that some words have sound illustrations, e.g. http://en.wiktionary.org/wiki/go%C3%BBter
These audio bites are simple 2-3 second OGG files, e.g. http://commons.wikimedia.org/wiki/File:Fr-go%C3%BBter.ogg
but they are limited in number. It would be very easy to record more of them, but before you get started it takes some time to learn the details, and then you need to upload to Commons and specify a license, and provide a description, ... It's not very likely that the person who does all that is also a good voice in each desired language.
Here's a better plan:
Provide a tool on the toolserver, or any other server, having a simple link syntax that specifies the language code and the text, e.g. http://toolserver.org/mytool.php?lang=fr&text=gouter
The tool uses a cookie, that remembers that this user has agreed to submit contributions using cc0. At the first visit, this question is asked as a click-through license.
The user is now prompted with the text (from the URL) and recording starts when pressing a button. The user says the word, and presses the button again. The tool saves the OGG sound, uploads it to Commons with the filename fr-gouter-XYZ789.ogg and the cc0 declaration and all metadata, placing it in a category of recorded but unverified words.
Another user can record the same word, and it will be given another random letter-digit code.
As a separate part of the tool, other volunteers are asked to verify or rate (1 to 5 stars) the recordings available in a given language. The rating is stored as categories on commons.
Now, a separate procedure (manual or a bot job) can pick words that need new or improved recordings, and list them (with links to the tool) on a normal wiki page.
I know HTML supports uploading of a file, but I don't know how to solve the recording of sound directly to a web service. Perhaps this could be a Skype application? I have no idea. Please just be creative. It should be solvable, because this is 2013 and not 2003.
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
It was solvable with a java applet (or flash, but that's usually considered evil) back in 2003. However it still requires someone to actually do it.
With modern web browsers, you can do it with html5/webRTC [1].
Someone could probably make an extension that integrates with MediaWiki, so all user has to do is go to special:recordAudio and they could record/upload from there. Perhaps that would make a good gsoc project (Not sure if the scope is big enough, but could probably add stuff like making a slick ui to make it big enough).
[1] http://www.html5rocks.com/en/tutorials/getusermedia/intro/
On Tue, Mar 12, 2013 at 9:29 PM, Brian Wolff bawolff@gmail.com wrote:
It was solvable with a java applet (or flash, but that's usually considered evil) back in 2003. However it still requires someone to actually do it.
For security purposes, I'm really hoping we don't plan on using a Java applet. :P
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On 3/12/13, Tyler Romeo tylerromeo@gmail.com wrote:
On Tue, Mar 12, 2013 at 9:29 PM, Brian Wolff bawolff@gmail.com wrote:
It was solvable with a java applet (or flash, but that's usually considered evil) back in 2003. However it still requires someone to actually do it.
For security purposes, I'm really hoping we don't plan on using a Java applet. :P
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Why? There's nothing inherently insecure about java applets. We already use them to play ogg files on lame browsers that don't support html5.
--bawolff
On Mar 12, 2013 10:08 PM, "Brian Wolff" bawolff@gmail.com wrote:
On 3/12/13, Tyler Romeo tylerromeo@gmail.com wrote:
On Tue, Mar 12, 2013 at 9:29 PM, Brian Wolff bawolff@gmail.com wrote:
It was solvable with a java applet (or flash, but that's usually considered evil) back in 2003. However it still requires someone to actually do it.
For security purposes, I'm really hoping we don't plan on using a Java applet. :P
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Why? There's nothing inherently insecure about java applets. We already use them to play ogg files on lame browsers that don't support html5.
--bawolff
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Can you say that for sure? With the number of exploits in Java over the past few months, everybody I know has already disabled their browser plugin.
--Tyler Romeo
On 3/12/13, Tyler Romeo tylerromeo@gmail.com wrote:
On Mar 12, 2013 10:08 PM, "Brian Wolff" bawolff@gmail.com wrote:
On 3/12/13, Tyler Romeo tylerromeo@gmail.com wrote:
On Tue, Mar 12, 2013 at 9:29 PM, Brian Wolff bawolff@gmail.com wrote:
It was solvable with a java applet (or flash, but that's usually considered evil) back in 2003. However it still requires someone to actually do it.
For security purposes, I'm really hoping we don't plan on using a Java applet. :P
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Why? There's nothing inherently insecure about java applets. We already use them to play ogg files on lame browsers that don't support html5.
--bawolff
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Can you say that for sure? With the number of exploits in Java over the past few months, everybody I know has already disabled their browser plugin.
--Tyler Romeo _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Those types of people will probably have an html5 capable web browser :P
Let me rephrase my previous statement as, using java as a fallback doesn't introduce any new issues that wouldn't be already there if we didn't use java as a fallback. (Since we'd only fallback to java if the user already had it installed). Furthermore, I imagine (or hope at least) that oracle fixes the security vulnerabilities of their plugin as they are discovered.
-bawolff
On Wed, Mar 13, 2013 at 11:29 AM, Brian Wolff bawolff@gmail.com wrote:
Someone could probably make an extension that integrates with MediaWiki, so all user has to do is go to special:recordAudio and they could record/upload from there. Perhaps that would make a good gsoc project (Not sure if the scope is big enough, but could probably add stuff like making a slick ui to make it big enough).
That wouldn't be a bad project for GSoC as it isn't too large so it means we could actually see some results, And if it was too small, The student could probably do a couple of smaller projects (it being one) then focus on one after the other.
Le 13/03/13 04:07, K. Peachey wrote:
That wouldn't be a bad project for GSoC as it isn't too large so it means we could actually see some results, And if it was too small, The student could probably do a couple of smaller projects (it being one) then focus on one after the other.
The smaller big project: get its code deployed on the cluster and enabled for all wikis!
On 03/13/2013 12:15 AM, Antoine Musso wrote:
Le 13/03/13 04:07, K. Peachey wrote:
That wouldn't be a bad project for GSoC as it isn't too large so it means we could actually see some results, And if it was too small, The student could probably do a couple of smaller projects (it being one) then focus on one after the other.
The smaller big project: get its code deployed on the cluster and enabled for all wikis!
Quick reminder:
If you think something would be a good project for a student, put it on https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects . I suggest we scope these proposals at about 6 weeks of coding work to ensure we dedicate enough time (out of the 3-month GSoC period) to bugfixing and code review. Past proposals often allotted either no time or about 2 weeks for merging with trunk, pre-deploy code review, and integration. That's not enough.
Basically, if you think a project might take about 2 weeks for you to code, go ahead and put it on that list. Students run into lots of problems, and your 2-week project is someone else's whole summer.
I'm not sure whether it'd be helpful for this project, but https://github.com/akrennmair/speech-to-server looks interesting. Somebody ported lame (the mp3 encoder) to JavaScript. The demo I linked to records in the browser and streams it to a server over websocket.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Sun, Mar 17, 2013 at 9:47 AM, Sumana Harihareswara <sumanah@wikimedia.org
wrote:
On 03/13/2013 12:15 AM, Antoine Musso wrote:
Le 13/03/13 04:07, K. Peachey wrote:
That wouldn't be a bad project for GSoC as it isn't too large so it
means
we could actually see some results, And if it was too small, The student could probably do a couple of smaller projects (it being one) then
focus on
one after the other.
The smaller big project: get its code deployed on the cluster and enabled for all wikis!
Quick reminder:
If you think something would be a good project for a student, put it on https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects . I suggest we scope these proposals at about 6 weeks of coding work to ensure we dedicate enough time (out of the 3-month GSoC period) to bugfixing and code review. Past proposals often allotted either no time or about 2 weeks for merging with trunk, pre-deploy code review, and integration. That's not enough.
Basically, if you think a project might take about 2 weeks for you to code, go ahead and put it on that list. Students run into lots of problems, and your 2-week project is someone else's whole summer.
-- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That wouldn't be a bad project for GSoC as it isn't too large so it
means
we could actually see some results.
Feedback and help about this feature is welcome at
https://bugzilla.wikimedia.org/show_bug.cgi?id=31221
(we might create a bug report specific to it, but this is the location we have now)
There is a potential GSOC student already interested:
https://bugzilla.wikimedia.org/show_bug.cgi?id=31221#c10
(...)
If you think something would be a good project for a student, put it on https://www.mediawiki.org/wiki/Mentorship_programs/Possible_projects
Exactly. Slowly slowly that page is becoming a good reference for projects wanted by the community. Bigger than an annoying bug but still something one single person with some dedication and help could complete in a reasonable period of time.
I have been linking those projects proposals with bugzilla reports, creating new ones if needed. You are encouraged to do the same, it is a lot more efficient for tracking and documenting the related discussions.
On 03/27/2013 10:41 AM, Quim Gil wrote:
That wouldn't be a bad project for GSoC as it isn't too large so it
means
we could actually see some results.
Feedback and help about this feature is welcome at
https://bugzilla.wikimedia.org/show_bug.cgi?id=31221
(we might create a bug report specific to it, but this is the location we have now)
Yes, I've separated it into two:
https://bugzilla.wikimedia.org/show_bug.cgi?id=31221 - Computer text-to-speech as originally requested
https://bugzilla.wikimedia.org/show_bug.cgi?id=46610 - Easy way to record and upload words.
Matt Flaschen
On 03/27/2013 01:12 AM, Tyler Romeo wrote:
I'm not sure whether it'd be helpful for this project, but https://github.com/akrennmair/speech-to-server looks interesting. Somebody ported lame (the mp3 encoder) to JavaScript. The demo I linked to records in the browser and streams it to a server over websocket.
Unfortunately, I don't think that's workable, due to patent issues.
It might be feasible to do the same with a free format, though I do wonder about the performance of a JS codec.
Matt Flaschen
On 13/03/13 02:29, Brian Wolff wrote:
It was solvable with a java applet (or flash, but that's usually considered evil) back in 2003. However it still requires someone to actually do it.
I believe Flash should be Ok if made to work on gnash but am not sure if gnash supports everything needed.
On 03/12/2013 09:01 PM, Lars Aronsson wrote:
Provide a tool on the toolserver, or any other server, having a simple link syntax that specifies the language code and the text, e.g. http://toolserver.org/mytool.php?lang=fr&text=gouter
Good idea, though I agree with Brian a special page would be preferable.
The tool uses a cookie, that remembers that this user has agreed to submit contributions using cc0. At the first visit, this question is asked as a click-through license.
Why CC0 (public domain)? Your example (http://commons.wikimedia.org/wiki/File:Fr-go%C3%BBter.ogg) is CC-BY, which is not public domain and requires attribution (which I think all Wikimedia projects do for text). I'd say CC-BY-SA or CC-BY would be a better default.
Matt Flaschen
On 13/03/13 02:48, Matthew Flaschen wrote:
The tool uses a cookie, that remembers that this user has agreed to submit contributions using cc0. At the first visit, this question is asked as a click-through license.
Why CC0 (public domain)? Your example (http://commons.wikimedia.org/wiki/File:Fr-go%C3%BBter.ogg) is CC-BY, which is not public domain and requires attribution (which I think all Wikimedia projects do for text). I'd say CC-BY-SA or CC-BY would be a better default.
I am not sure about copyrightability of a pronunciation of a single word.
On 03/13/2013 03:17 AM, Nikola Smolenski wrote:
Why CC0 (public domain)? Your example (http://commons.wikimedia.org/wiki/File:Fr-go%C3%BBter.ogg) is CC-BY, which is not public domain and requires attribution (which I think all Wikimedia projects do for text). I'd say CC-BY-SA or CC-BY would be a better default.
I am not sure about copyrightability of a pronunciation of a single word.
Neither am I, but if it's licensed under one of those and a court finds it's not copyrightable, so be it. It still seems reasonable to use an attribution license.
Matt Flaschen
It would be a good application for mobile too.
In browser would be reasonably easy with Flash, and can be done with JavaScript in modern browsers but not yet in a consistent way. There is a W3 spec but using a library like https://github.com/jussi-kalliokoski/sink.js/ would be easier than writing per browser versions to take into account current real world variation.
A mobile app, or a few native apps for dominant platforms presumably expose a cleaner interface to what is a core device on that hardware, rather than an optional, variable peripheral on computers.
Luke
On Wed, Mar 13, 2013 at 4:03 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 03/13/2013 03:17 AM, Nikola Smolenski wrote:
Why CC0 (public domain)? Your example (http://commons.wikimedia.org/wiki/File:Fr-go%C3%BBter.ogg) is CC-BY, which is not public domain and requires attribution (which I think all Wikimedia projects do for text). I'd say CC-BY-SA or CC-BY would be a better default.
I am not sure about copyrightability of a pronunciation of a single word.
Neither am I, but if it's licensed under one of those and a court finds it's not copyrightable, so be it. It still seems reasonable to use an attribution license.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I would strongly support to not lend support to the believe that everything under the sun is copyrightable. We should, in my opinion, take the position that trivial things like these are not copyrightable and should put a CC0 on it. We should not set an example and establish a practice that single words can be copyrightable. At all. I think, by defaulting to that assumption, we support the idea that these things can be legally protected under copyright law, and by this we do a strong diservice to our actual mission.
Sorry for the rant and for the not-completely-on-topicness.
Cheers, Denny
2013/3/13 Matthew Flaschen mflaschen@wikimedia.org
On 03/13/2013 03:17 AM, Nikola Smolenski wrote:
Why CC0 (public domain)? Your example (http://commons.wikimedia.org/wiki/File:Fr-go%C3%BBter.ogg) is CC-BY, which is not public domain and requires attribution (which I think all Wikimedia projects do for text). I'd say CC-BY-SA or CC-BY would be a better default.
I am not sure about copyrightability of a pronunciation of a single word.
Neither am I, but if it's licensed under one of those and a court finds it's not copyrightable, so be it. It still seems reasonable to use an attribution license.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 13/03/13 02:01, Lars Aronsson wrote:
Provide a tool on the toolserver, or any other server, having a simple link syntax that specifies the language code and the text, e.g. http://toolserver.org/mytool.php?lang=fr&text=gouter
I was thinking about this already and yes, this is a great idea! :)
A very nice website that does this already is www.forvo.com but they claim by-nc-sa licence. But the way it works could be used as inspiration.
A possible additional feature would be for speakers to indicate their locality, age, accent etc. (so that words differently pronounced in different accents of the same language would be marked as such).
Another possible feature would be some sort of verification or someone might vandalize by cursing or similar (on Forvo this is done by voting).
On 03/13/2013 08:16 AM, Nikola Smolenski wrote:
A very nice website that does this already is www.forvo.com but they claim by-nc-sa licence. But the way it works could be used as inspiration.
Forvo looks very nice, and if they can do the job, I'm happy that we don't have to. We should try to collaborate with them.
However, when I try it, it says "Oops! The recorder is having a fix. Please, try again later. Thanks..", both yesterday and today.
On 03/14/2013 05:49 PM, Lars Aronsson wrote:
Forvo looks very nice, and if they can do the job, I'm happy that we don't have to. We should try to collaborate with them.
Unfortunately, their license (non-commercial) is not free as in freedom, and not acceptable for Wikimedia projects.
It's possible they might be able to change that license for particular recordings.
Matt Flaschen
On 03/13/2013 08:16 AM, Nikola Smolenski wrote:
A very nice website that does this already is www.forvo.com but they claim by-nc-sa licence.
Ah, now I see this detail: Yes, the -NC- clause in their license makes them useless for us. That's a pity.
Having been through the great license shift in OpenStreetMap, I think we should use cc0 as far as we can. It remains my suggestion that any tool should demand cc0, but of course that will be the choice of the tool developer.
Dear Wikimedia ops team,
The most recent enwiki dump now seems to have finished _almost_ successfully, apart from the dumping of the database metadata tables such as the pages table and the various links tables, almost all of which have failed.
I wonder if there is any chance someone could give this a kick, and re-try the dumping of these tables to finish the dump?
Since this seems to have happened several times now, could it be worth considering automating re-trying in this sort of situation, to improve dump reliability for the future without needing manual intervention?
Thanks,
Neil
Στις 14-03-2013, ημέρα Πεμ, και ώρα 23:24 +0000, ο/η Neil Harris έγραψε:
Dear Wikimedia ops team,
The most recent enwiki dump now seems to have finished _almost_ successfully, apart from the dumping of the database metadata tables such as the pages table and the various links tables, almost all of which have failed.
I wonder if there is any chance someone could give this a kick, and re-try the dumping of these tables to finish the dump?
It's rerunning the tables now.
Since this seems to have happened several times now, could it be worth considering automating re-trying in this sort of situation, to improve dump reliability for the future without needing manual intervention?
Because the reasons for failure are varied (ranging from hardware failure to dbs being unreachable to broken MediaWiki code being deployed), automating restarts isn't practical; typically someone needs to find out what the underlying cause is and take appropriate action.
Ariel
On 14/03/13 23:39, Ariel T. Glenn wrote:
Στις 14-03-2013, ημέρα Πεμ, και ώρα 23:24 +0000, ο/η Neil Harris έγραψε:
Dear Wikimedia ops team,
The most recent enwiki dump now seems to have finished _almost_ successfully, apart from the dumping of the database metadata tables such as the pages table and the various links tables, almost all of which have failed.
I wonder if there is any chance someone could give this a kick, and re-try the dumping of these tables to finish the dump?
It's rerunning the tables now.
Thank you!
Since this seems to have happened several times now, could it be worth considering automating re-trying in this sort of situation, to improve dump reliability for the future without needing manual intervention?
Because the reasons for failure are varied (ranging from hardware failure to dbs being unreachable to broken MediaWiki code being deployed), automating restarts isn't practical; typically someone needs to find out what the underlying cause is and take appropriate action.
Ariel
I should have thought of this -- sorry for the overly simplistic suggestion,
Thanks again,
Neil
On 03/14/2013 07:24 PM, Neil Harris wrote:
Dear Wikimedia ops team,
Please do not hit "reply" when you want to start an unrelated thread.
Compose a new email to wikitech-l@lists.wikimedia.org . That way it won't be grouped with something unrelated.
Thanks,
Matt Flaschen
On 03/14/2013 07:08 PM, Lars Aronsson wrote:
Having been through the great license shift in OpenStreetMap, I think we should use cc0 as far as we can. It remains my suggestion that any tool should demand cc0, but of course that will be the choice of the tool developer.
For what it's worth, I disagree. I think by default tools should suggest attribution licenses (e.g. CC-BY). Copyleft (e.g. CC-BY-SA) makes sense for many works too, though it may not be necessary for one word.
Attribution is not too much to ask for. Most big works already have an attribution page somewhere.
Matt Flaschen
wikitech-l@lists.wikimedia.org