Very interesting, thanks for posting! The TED dataset is also quite interesting for Wikidata, because we are missing the generic concepts behind many Wikipedia articles. Most people complain that Wikipedia tends to dive into indepth information without giving adequate coverage in an overview article. Many overview articles have grown beyond normal viewing capacity on a mobile phone and probably should be split into 2nd and 3rd tier wikipages giving explanations about branches of the subject. To see what I mean, try viewing the English Wikipedia article for "Insurance" on your phone.
The TED talks touch on many of such missing subject items and it would be nice to crowdsource the creation of them. Your project could be possibly be a way to direct contributors to quick explanations and/or uses of such concepts. The fact that many TED talks are transcribed into so many different languages means we may be able to harness these translations for use in Wikidata labels. At least that is what I hope. Without labels, nothing is findable on Wikidata and that is why we still are so slow interlinking linkable items.
If your initiative takes off, it may be interesting to apply it to our own set of film media on Commons, but very little of that has been linked to Wikidata yet.
On Sun, Apr 24, 2016 at 1:15 PM, Raphaël Troncy raphael.troncy@eurecom.fr wrote:
Good news blog post:
https://blog.wikimedia.org/2016/04/22/ted-wikimedia-collaboration/
Great news! I didn't know neither that Wikidata has unique identifiers for so many TED talks.
FYI, my group has worked 18 months ago on a prototype we called HyperTED. You can read about it at http://linkedup-project.eu/2014/10/14/vici-shortlist-hyperted/. There is also a presentation at http://www.slideshare.net/JosLuisRedondoGarca/hyperted-40494120. And you can play directly with the HyperTED prototype at http://linkedtv.eurecom.fr/HyperTED/
In a nutshell, we used the TED talk metadata (subtitles divided into paragraphs) in order to provide chapters to TED talks. We have annotated them automatically using named entity recognition and disambiguation tools and topic detection algorithms. Hence, entities are disambiguated to dbpedia (but this could also be wikidata entities). Finally, we have developed an algorithm that detects hot spots in TED talks (read the scientific paper at http://www.eurecom.fr/~troncy/Publications/Redondo_Troncy-iswc14.pdf). Ultimately, as soon you watch chapters of TED talks, we are recommending you other chapters of other TED talks that may be related (because of common entities and topics). Instead of being a traditional recommender system that suggests you other TED talks, we perform recommendation at the fragment level.
We are eager to receive any feedback. Be gentle with the demo, we are aware of some bugs and limitations. Best regards.
Raphaël
-- Raphaël Troncy EURECOM, Campus SophiaTech Data Science Department 450 route des Chappes, 06410 Biot, France. e-mail: raphael.troncy@eurecom.fr & raphael.troncy@gmail.com Tel: +33 (0)4 - 9300 8242 Fax: +33 (0)4 - 9000 8200 Web: http://www.eurecom.fr/~troncy/
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata