Hoi,

This sounds REALLY interesting .... given that they are forty second snippets of sound, it may well be too short for them to be copyrightable. They are intended to hear what a person sounds like. If I am right on this, it may be that we can collect such snippets from everywhere.

Yes, it would be a good clever idea to combine Commons and Wikidata as proposed in this way.

Thanks,

Gerard

On 14 November 2013 14:04, Andrew Gray <andrew.gray@dunelm.org.uk> wrote:

A really elegant use-case for Wikidata here, thanks to Andy Mabbett
and the BBC R&D department:

http://www.bbc.co.uk/rd/blog/2013/11/speakerthon-uploading-voice-samples-from-the-radio-4-archive-to-wikipedia

----

As a piece of research we're looking to investigate whether voice
samples on Wikimedia/pedia could be used to generate a voice box
"fingerprint" which could then be used to identify speakers across a
large archive. Which would close the circle of archive audio to
speaker recognition to Wikimedia voice fingerprint to Wikipedia,
DBpedia or Wikidata identifier toLinked Open Data for speakers in an
archive.

To do that we'd need longer (duration) and higher quality samples than
suggested by the Voice Intro Project. So we're looking to upload 30-40
second voice samples losslessly encoded as FLAC. (...) the voice
samples will be openly licenced so other researchers and cultural
institutions will be able to use the same methods to annotate audio /
video with identified speakers. And hopefully contribute to the
project by uploading voice samples from their own archives. By
releasing small nuggets of their archives they'd be both improving
Wikipedia and putting just enough in place to make the further
contextualisation of their (and other) archives possible.

----

--
- Andrew Gray
andrew.gray@dunelm.org.uk

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l