Hoi,
This sounds REALLY interesting .... given that they are forty second snippets of sound, it may well be too short for them to be copyrightable. They are intended to hear what a person sounds like. If I am right on this, it may be that we can collect such snippets from everywhere.
Yes, it would be a good clever idea to combine Commons and Wikidata as proposed in this way. Thanks, Gerard
On 14 November 2013 14:04, Andrew Gray andrew.gray@dunelm.org.uk wrote:
A really elegant use-case for Wikidata here, thanks to Andy Mabbett and the BBC R&D department:
http://www.bbc.co.uk/rd/blog/2013/11/speakerthon-uploading-voice-samples-fro...
As a piece of research we're looking to investigate whether voice samples on Wikimedia/pedia could be used to generate a voice box "fingerprint" which could then be used to identify speakers across a large archive. Which would close the circle of archive audio to speaker recognition to Wikimedia voice fingerprint to Wikipedia, DBpedia or Wikidata identifier toLinked Open Data for speakers in an archive.
To do that we'd need longer (duration) and higher quality samples than suggested by the Voice Intro Project. So we're looking to upload 30-40 second voice samples losslessly encoded as FLAC. (...) the voice samples will be openly licenced so other researchers and cultural institutions will be able to use the same methods to annotate audio / video with identified speakers. And hopefully contribute to the project by uploading voice samples from their own archives. By releasing small nuggets of their archives they'd be both improving Wikipedia and putting just enough in place to make the further contextualisation of their (and other) archives possible.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l