[Wikidata-l] Embedding voice samples in Wikidata

14 Nov 2013


      A really elegant use-case for Wikidata here, thanks to Andy Mabbett
and the BBC R&D department:
http://www.bbc.co.uk/rd/blog/2013/11/speakerthon-uploading-voice-samples-fro...
----
As a piece of research we're looking to investigate whether voice
samples on Wikimedia/pedia could be used to generate a voice box
"fingerprint" which could then be used to identify speakers across a
large archive. Which would close the circle of archive audio to
speaker recognition to Wikimedia voice fingerprint to Wikipedia,
DBpedia or Wikidata identifier toLinked Open Data for speakers in an
archive.
To do that we'd need longer (duration) and higher quality samples than
suggested by the Voice Intro Project. So we're looking to upload 30-40
second voice samples losslessly encoded as FLAC. (...) the voice
samples will be openly licenced so other researchers and cultural
institutions will be able to use the same methods to annotate audio /
video with identified speakers. And hopefully contribute to the
project by uploading voice samples from their own archives. By
releasing small nuggets of their archives they'd be both improving
Wikipedia and putting just enough in place to make the further
contextualisation of their (and other) archives possible.
----
-- 
- Andrew Gray
  andrew.gray@dunelm.org.uk

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikidata-l] Embedding voice samples in Wikidata