A really elegant use-case for Wikidata here, thanks to Andy Mabbett and the BBC R&D department:
http://www.bbc.co.uk/rd/blog/2013/11/speakerthon-uploading-voice-samples-fro...
----
As a piece of research we're looking to investigate whether voice samples on Wikimedia/pedia could be used to generate a voice box "fingerprint" which could then be used to identify speakers across a large archive. Which would close the circle of archive audio to speaker recognition to Wikimedia voice fingerprint to Wikipedia, DBpedia or Wikidata identifier toLinked Open Data for speakers in an archive.
To do that we'd need longer (duration) and higher quality samples than suggested by the Voice Intro Project. So we're looking to upload 30-40 second voice samples losslessly encoded as FLAC. (...) the voice samples will be openly licenced so other researchers and cultural institutions will be able to use the same methods to annotate audio / video with identified speakers. And hopefully contribute to the project by uploading voice samples from their own archives. By releasing small nuggets of their archives they'd be both improving Wikipedia and putting just enough in place to make the further contextualisation of their (and other) archives possible.
----
Hoi,
This sounds REALLY interesting .... given that they are forty second snippets of sound, it may well be too short for them to be copyrightable. They are intended to hear what a person sounds like. If I am right on this, it may be that we can collect such snippets from everywhere.
Yes, it would be a good clever idea to combine Commons and Wikidata as proposed in this way. Thanks, Gerard
On 14 November 2013 14:04, Andrew Gray andrew.gray@dunelm.org.uk wrote:
A really elegant use-case for Wikidata here, thanks to Andy Mabbett and the BBC R&D department:
http://www.bbc.co.uk/rd/blog/2013/11/speakerthon-uploading-voice-samples-fro...
As a piece of research we're looking to investigate whether voice samples on Wikimedia/pedia could be used to generate a voice box "fingerprint" which could then be used to identify speakers across a large archive. Which would close the circle of archive audio to speaker recognition to Wikimedia voice fingerprint to Wikipedia, DBpedia or Wikidata identifier toLinked Open Data for speakers in an archive.
To do that we'd need longer (duration) and higher quality samples than suggested by the Voice Intro Project. So we're looking to upload 30-40 second voice samples losslessly encoded as FLAC. (...) the voice samples will be openly licenced so other researchers and cultural institutions will be able to use the same methods to annotate audio / video with identified speakers. And hopefully contribute to the project by uploading voice samples from their own archives. By releasing small nuggets of their archives they'd be both improving Wikipedia and putting just enough in place to make the further contextualisation of their (and other) archives possible.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 14 November 2013 14:45, Gerard Meijssen gerard.meijssen@gmail.com wrote:
This sounds REALLY interesting .... given that they are forty second snippets of sound, it may well be too short for them to be copyrightable. They are intended to hear what a person sounds like. If I am right on this, it may be that we can collect such snippets from everywhere.
They are copyrightable.
On 14 November 2013 13:04, Andrew Gray andrew.gray@dunelm.org.uk wrote:
A really elegant use-case for Wikidata here, thanks to Andy Mabbett and the BBC R&D department:
Thank you. I've already added links to several voice files, using P990 "Voice recording". Now you know why I proposed that ;-)
Andy, Really brilliant idea. Do you know what template and lua modules you are going to use? Is it worth making a special template connect these and track their multilingual usage?
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
________________________________________ From: wikidata-l-bounces@lists.wikimedia.org wikidata-l-bounces@lists.wikimedia.org on behalf of Andy Mabbett andy@pigsonthewing.org.uk Sent: Thursday, November 14, 2013 7:45 AM To: Discussion list for the Wikidata project. Subject: Re: [Wikidata-l] Embedding voice samples in Wikidata
On 14 November 2013 13:04, Andrew Gray andrew.gray@dunelm.org.uk wrote:
A really elegant use-case for Wikidata here, thanks to Andy Mabbett and the BBC R&D department:
Thank you. I've already added links to several voice files, using P990 "Voice recording". Now you know why I proposed that ;-)
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 14 November 2013 17:49, Klein,Max kleinm@oclc.org wrote:
Really brilliant idea.
Thank you.
Do you know what template and lua modules you are going to use?
They're being added to articles using {{Listen}} - sometimes, as an embedded module of various infoboxes. No Lua involved.
Is it worth making a special template connect these and track their multilingual usage?
l'm not sure what you have in mind here.
This is certainly an interesting idea, but I'm not sure it has a place in either Wikipedia or Wikidata unless we're talking about the clips being notable quotes.
For Wikipedia, if it's just a voice sample - as opposed to a notable quote - the community is going to view it as cruft and remove it from articles, as the majority of users will find a contextless sound clip to be of little encyclopedic value.
For Wikidata, why would we link to an audio sample if it's of no valueto sister projects and no different from other voice samples (except for the license).
I like the idea, don't get me wrong. I just think that the broader community is not going to see the utility in the samples.
Sven Andy, Really brilliant idea. Do you know what template and lua modules you are going to use? Is it worth making a special template connect these and track their multilingual usage?
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
________________________________________ From: wikidata-l-bounces@lists.wikimedia.org < wikidata-l-bounces@lists.wikimedia.org> on behalf of Andy Mabbett < andy@pigsonthewing.org.uk> Sent: Thursday, November 14, 2013 7:45 AM To: Discussion list for the Wikidata project. Subject: Re: [Wikidata-l] Embedding voice samples in Wikidata
On 14 November 2013 13:04, Andrew Gray andrew.gray@dunelm.org.uk wrote:
A really elegant use-case for Wikidata here, thanks to Andy Mabbett and the BBC R&D department:
Thank you. I've already added links to several voice files, using P990 "Voice recording". Now you know why I proposed that ;-)
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hoi, Wikidata is patient. It can hold a lot of data. We hold more references to external that we objectively need to.
We can include these sources and they are intended as snippets of text that identify what people sound like. That is enough in my opinion. It does provided something extra.We also include what people look like.. Same principle.. Thanks, GerardM
On 15 November 2013 08:54, Sven Manguard svenmanguard@gmail.com wrote:
This is certainly an interesting idea, but I'm not sure it has a place in either Wikipedia or Wikidata unless we're talking about the clips being notable quotes.
For Wikipedia, if it's just a voice sample - as opposed to a notable quote
- the community is going to view it as cruft and remove it from articles,
as the majority of users will find a contextless sound clip to be of little encyclopedic value.
For Wikidata, why would we link to an audio sample if it's of no valueto sister projects and no different from other voice samples (except for the license).
I like the idea, don't get me wrong. I just think that the broader community is not going to see the utility in the samples.
Sven Andy, Really brilliant idea. Do you know what template and lua modules you are going to use? Is it worth making a special template connect these and track their multilingual usage?
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
From: wikidata-l-bounces@lists.wikimedia.org < wikidata-l-bounces@lists.wikimedia.org> on behalf of Andy Mabbett < andy@pigsonthewing.org.uk> Sent: Thursday, November 14, 2013 7:45 AM To: Discussion list for the Wikidata project. Subject: Re: [Wikidata-l] Embedding voice samples in Wikidata
On 14 November 2013 13:04, Andrew Gray andrew.gray@dunelm.org.uk wrote:
A really elegant use-case for Wikidata here, thanks to Andy Mabbett and the BBC R&D department:
Thank you. I've already added links to several voice files, using P990 "Voice recording". Now you know why I proposed that ;-)
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 15 November 2013 07:54, Sven Manguard svenmanguard@gmail.com wrote:
This is certainly an interesting idea, but I'm not sure it has a place in either Wikipedia or Wikidata unless we're talking about the clips being notable quotes.
For Wikipedia, if it's just a voice sample - as opposed to a notable quote - the community is going to view it as cruft and remove it from articles, as the majority of users will find a contextless sound clip to be of little encyclopedic value.
For Wikidata, why would we link to an audio sample if it's of no valueto sister projects and no different from other voice samples (except for the license).
I like the idea, don't get me wrong. I just think that the broader community is not going to see the utility in the samples.
I think that audio clips - as supplementary material - do have definite value; undoubtedly they're of less value than a photograph, but they're probably more useful than a signature, which seems to be fairly well accepted (on enwiki at least). Beats me as to why...
Audio clips of major quotes (or whole speeches, etc) are definitely more value than more mundane ones, in the way that a picture of historic significance is better than a conventional portrait, but I wouldn't agree that they're automatically contextless just because you don't already know what they're saying. Of the three samples given there, we have:
* Mary Robinson talking about her upbringing * Mark Carney discussing economic policy * Justin Welby on ethics & banking
The general approach of the BBC material makes it likely that most of the clips will be people discussing themselves, their work, or their field of expertise, all of which seem contextually appropriate.
Thirdly, whether Wikipedia wants it or not this is definitely useful and appropriate material for Commons, and if Commons has a distinctive class of items attached to subjects then it seems reasonable to note that on Wikidata. Again, signatures are a good example - https://www.wikidata.org/wiki/Property:P109 - but there's also things like https://www.wikidata.org/wiki/Property:P94 (coat of arms image)
The fact that we've got external reusers doing something cool (matching Wikidata entities by voice recognition!) is the icing on the cake ;-)
There seems to have been a misunderstanding on my part, for which I apologize. When I read this the first time I thought that you were stitching together audio clips specifically for voice identification. Audio clips for voice identification, at least in my experience, tend to just be a collection of syllables, as that is what is basically needed to do voice identification. If you are talking about substantive quotes, with your samples seem to be indicating you are, then what I was worried about and what you intend to do are very different things.
I will retract the concerns that I laid out in the previous email, as they appear to be unfounded.
Apologies again, Sven On Nov 15, 2013 5:17 PM, "Andrew Gray" andrew.gray@dunelm.org.uk wrote:
On 15 November 2013 07:54, Sven Manguard svenmanguard@gmail.com wrote:
This is certainly an interesting idea, but I'm not sure it has a place in either Wikipedia or Wikidata unless we're talking about the clips being notable quotes.
For Wikipedia, if it's just a voice sample - as opposed to a notable
quote -
the community is going to view it as cruft and remove it from articles,
as
the majority of users will find a contextless sound clip to be of little encyclopedic value.
For Wikidata, why would we link to an audio sample if it's of no valueto sister projects and no different from other voice samples (except for the license).
I like the idea, don't get me wrong. I just think that the broader
community
is not going to see the utility in the samples.
I think that audio clips - as supplementary material - do have definite value; undoubtedly they're of less value than a photograph, but they're probably more useful than a signature, which seems to be fairly well accepted (on enwiki at least). Beats me as to why...
Audio clips of major quotes (or whole speeches, etc) are definitely more value than more mundane ones, in the way that a picture of historic significance is better than a conventional portrait, but I wouldn't agree that they're automatically contextless just because you don't already know what they're saying. Of the three samples given there, we have:
- Mary Robinson talking about her upbringing
- Mark Carney discussing economic policy
- Justin Welby on ethics & banking
The general approach of the BBC material makes it likely that most of the clips will be people discussing themselves, their work, or their field of expertise, all of which seem contextually appropriate.
Thirdly, whether Wikipedia wants it or not this is definitely useful and appropriate material for Commons, and if Commons has a distinctive class of items attached to subjects then it seems reasonable to note that on Wikidata. Again, signatures are a good example - https://www.wikidata.org/wiki/Property:P109 - but there's also things like https://www.wikidata.org/wiki/Property:P94 (coat of arms image)
The fact that we've got external reusers doing something cool (matching Wikidata entities by voice recognition!) is the icing on the cake ;-)
--
- Andrew Gray andrew.gray@dunelm.org.uk
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
No worries :-)
Have a listen to a couple of the the examples - I think they complement the article surprisingly well.
https://en.wikipedia.org/wiki/Mary_Robinson
(strangely enough, I remember driving to work listening to the same program!)
A.
On 15 November 2013 22:23, Sven Manguard svenmanguard@gmail.com wrote:
There seems to have been a misunderstanding on my part, for which I apologize. When I read this the first time I thought that you were stitching together audio clips specifically for voice identification. Audio clips for voice identification, at least in my experience, tend to just be a collection of syllables, as that is what is basically needed to do voice identification. If you are talking about substantive quotes, with your samples seem to be indicating you are, then what I was worried about and what you intend to do are very different things.
I will retract the concerns that I laid out in the previous email, as they appear to be unfounded.
Apologies again, Sven
On Nov 15, 2013 5:17 PM, "Andrew Gray" andrew.gray@dunelm.org.uk wrote:
On 15 November 2013 07:54, Sven Manguard svenmanguard@gmail.com wrote:
This is certainly an interesting idea, but I'm not sure it has a place in either Wikipedia or Wikidata unless we're talking about the clips being notable quotes.
For Wikipedia, if it's just a voice sample - as opposed to a notable quote - the community is going to view it as cruft and remove it from articles, as the majority of users will find a contextless sound clip to be of little encyclopedic value.
For Wikidata, why would we link to an audio sample if it's of no valueto sister projects and no different from other voice samples (except for the license).
I like the idea, don't get me wrong. I just think that the broader community is not going to see the utility in the samples.
I think that audio clips - as supplementary material - do have definite value; undoubtedly they're of less value than a photograph, but they're probably more useful than a signature, which seems to be fairly well accepted (on enwiki at least). Beats me as to why...
Audio clips of major quotes (or whole speeches, etc) are definitely more value than more mundane ones, in the way that a picture of historic significance is better than a conventional portrait, but I wouldn't agree that they're automatically contextless just because you don't already know what they're saying. Of the three samples given there, we have:
- Mary Robinson talking about her upbringing
- Mark Carney discussing economic policy
- Justin Welby on ethics & banking
The general approach of the BBC material makes it likely that most of the clips will be people discussing themselves, their work, or their field of expertise, all of which seem contextually appropriate.
Thirdly, whether Wikipedia wants it or not this is definitely useful and appropriate material for Commons, and if Commons has a distinctive class of items attached to subjects then it seems reasonable to note that on Wikidata. Again, signatures are a good example - https://www.wikidata.org/wiki/Property:P109 - but there's also things like https://www.wikidata.org/wiki/Property:P94 (coat of arms image)
The fact that we've got external reusers doing something cool (matching Wikidata entities by voice recognition!) is the icing on the cake ;-)
--
- Andrew Gray andrew.gray@dunelm.org.uk
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 11/14/2013 08:04 AM, Andrew Gray wrote:
A really elegant use-case for Wikidata here, thanks to Andy Mabbett and the BBC R&D department:
http://www.bbc.co.uk/rd/blog/2013/11/speakerthon-uploading-voice-samples-fro...
This is quite interesting. It's great to see the BBC participating; it ties together Wikidata, and the BBC's collection very nicely.
I agree that even if Wikipedia doesn't want longer clips for these intros, it's still appropriate for Commons and Wikidata.
Matt Flaschen