Hey folks :)
Today Wikidata is turning two. It amazes me what we've achieved in just 2 years. We've built an incredible project that is set out to change the world. Thank you everyone who has been a part of this so far. We've put together some notes and opinions. And there are presents as well! Check them out and leave your birthday wishes: https://www.wikidata.org/wiki/Wikidata:Second_Birthday
Cheers Lydia
Folks,
as you know, many Googlers are huge fans of Wikipedia. So here’s a little gift for Wikidata’s second birthday.
Some of my smart colleagues at Google have run a few heuristics and algorithms in order to discover Wikipedia articles in different languages about the same topic which are missing language links between the articles. The results contain more than 35,000 missing links with a high confidence according to these algorithms. We estimate a precision of about 92+% (i.e. we assume that less than 8% of those are wrong, based on our evaluation). The dataset covers 60 Wikipedia language editions.
Here are the missing links, available for download from the WMF labs servers:
https://tools.wmflabs.org/yichengtry/merge_candidate.20141028.csv
The data is published under CC-0.
What can you do with the data? Since it is CC-0, you can do anything you want, obviously, but here are a few suggestions:
There’s a small tool on WMF labs that you can use to verify the links (it displays the articles side by side from a language pair you select, and then you can confirm or contradict the merge):
https://tools.wmflabs.org/yichengtry
The tool does not do the change in Wikidata itself, though (we thought it would be too invasive if we did that). Instead, the results of the human evaluation are saved on WMF labs. You are welcome to take the tool and extend it with the possibility to upload the change directly on Wikidata, if you so wish, or, once the data is verified, to upload the results.
Also, Magnus Manske is already busy uploading the data to the Wikidata game, so you can very soon also play the merge game on the data directly. He is also creating the missing items on Wikidata. Thanks Magnus for a very pleasant cooperation!
I want to call out to my colleagues at Google who created the dataset - Jiang Bian and Si Li - and to Yicheng Huang, the intern who developed the tool on labs.
I hope that this small data release can help a little with further improving the quality of Wikidata and Wikipedia! Thank you all, you are awesome!
Cheers, Denny
On Wed Oct 29 2014 at 10:52:05 AM Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
Hey folks :)
Today Wikidata is turning two. It amazes me what we've achieved in just 2 years. We've built an incredible project that is set out to change the world. Thank you everyone who has been a part of this so far. We've put together some notes and opinions. And there are presents as well! Check them out and leave your birthday wishes: https://www.wikidata.org/wiki/Wikidata:Second_Birthday
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
_______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Wed Oct 29 2014 at 10:56:42 Denny Vrandečić vrandecic@google.com wrote:
There’s a small tool on WMF labs that you can use to verify the links (it displays the articles side by side from a language pair you select, and then you can confirm or contradict the merge):
This is really fun, and so useful too. Thank you so much, Denny, Jiang Bian, Si Li, and Yicheng Huang – "Denny and the Googlers" is a new band name if ever there was one.
I can connect all of them by bot but I'm not sure it should be done automatically.
Happy birthday Wikidata :)
On 10/29/14, James Forrester jdforrester@gmail.com wrote:
On Wed Oct 29 2014 at 10:56:42 Denny Vrandečić vrandecic@google.com wrote:
There’s a small tool on WMF labs that you can use to verify the links (it displays the articles side by side from a language pair you select, and then you can confirm or contradict the merge):
This is really fun, and so useful too. Thank you so much, Denny, Jiang Bian, Si Li, and Yicheng Huang – "Denny and the Googlers" is a new band name if ever there was one.
Funnier to do manually ;) Anyway, it's very nice tool! Thanks to everyone who developed it.
*Stryn*
2014-10-29 21:37 GMT+02:00 Amir Ladsgroup ladsgroup@gmail.com:
I can connect all of them by bot but I'm not sure it should be done automatically.
Happy birthday Wikidata :)
On 10/29/14, James Forrester jdforrester@gmail.com wrote:
On Wed Oct 29 2014 at 10:56:42 Denny Vrandečić vrandecic@google.com wrote:
There’s a small tool on WMF labs that you can use to verify the links
(it
displays the articles side by side from a language pair you select, and then you can confirm or contradict the merge):
This is really fun, and so useful too. Thank you so much, Denny, Jiang Bian, Si Li, and Yicheng Huang – "Denny and the Googlers" is a new band name if ever there was one.
-- Amir
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
This is great, thanks!
A.
On Wed, Oct 29, 2014 at 11:10 AM, James Forrester jdforrester@gmail.com wrote:
On Wed Oct 29 2014 at 10:56:42 Denny Vrandečić vrandecic@google.com wrote:
There’s a small tool on WMF labs that you can use to verify the links (it displays the articles side by side from a language pair you select, and then you can confirm or contradict the merge):
This is really fun, and so useful too. Thank you so much, Denny, Jiang Bian, Si Li, and Yicheng Huang – "Denny and the Googlers" is a new band name if ever there was one.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hey,
Does this mean we can also shoot a TODO list in the direction of Google? :)
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Sure, you can keep all your todos with Google ;)
https://www.gmail.com/mail/help/tasks/
Cheers, Denny
On Wed Oct 29 2014 at 2:58:03 PM Jeroen De Dauw jeroendedauw@gmail.com wrote:
Hey,
Does this mean we can also shoot a TODO list in the direction of Google? :)
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3 _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Denny,
great tool! I couldn't find the source code, though. Can you point me to the repository it's hosted at?
Regards, Adrian
On Wed, Oct 29, 2014 at 6:56 PM, Denny Vrandečić vrandecic@google.com wrote:
Folks,
as you know, many Googlers are huge fans of Wikipedia. So here’s a little gift for Wikidata’s second birthday.
Some of my smart colleagues at Google have run a few heuristics and algorithms in order to discover Wikipedia articles in different languages about the same topic which are missing language links between the articles. The results contain more than 35,000 missing links with a high confidence according to these algorithms. We estimate a precision of about 92+% (i.e. we assume that less than 8% of those are wrong, based on our evaluation). The dataset covers 60 Wikipedia language editions.
Here are the missing links, available for download from the WMF labs servers:
https://tools.wmflabs.org/yichengtry/merge_candidate.20141028.csv
The data is published under CC-0.
What can you do with the data? Since it is CC-0, you can do anything you want, obviously, but here are a few suggestions:
There’s a small tool on WMF labs that you can use to verify the links (it displays the articles side by side from a language pair you select, and then you can confirm or contradict the merge):
https://tools.wmflabs.org/yichengtry
The tool does not do the change in Wikidata itself, though (we thought it would be too invasive if we did that). Instead, the results of the human evaluation are saved on WMF labs. You are welcome to take the tool and extend it with the possibility to upload the change directly on Wikidata, if you so wish, or, once the data is verified, to upload the results.
Also, Magnus Manske is already busy uploading the data to the Wikidata game, so you can very soon also play the merge game on the data directly. He is also creating the missing items on Wikidata. Thanks Magnus for a very pleasant cooperation!
I want to call out to my colleagues at Google who created the dataset - Jiang Bian and Si Li - and to Yicheng Huang, the intern who developed the tool on labs.
I hope that this small data release can help a little with further improving the quality of Wikidata and Wikipedia! Thank you all, you are awesome!
Cheers, Denny
On Wed Oct 29 2014 at 10:52:05 AM Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hey folks :)
Today Wikidata is turning two. It amazes me what we've achieved in just 2 years. We've built an incredible project that is set out to change the world. Thank you everyone who has been a part of this so far. We've put together some notes and opinions. And there are presents as well! Check them out and leave your birthday wishes: https://www.wikidata.org/wiki/Wikidata:Second_Birthday
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l _______________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
El Wed, 29 Oct 2014 18:51:24 +0100 Lydia Pintscher lydia.pintscher@wikimedia.de escribió:
Hey folks :)
Today Wikidata is turning two. It amazes me what we've achieved in just 2 years. We've built an incredible project that is set out to change the world. Thank you everyone who has been a part of this so far. We've put together some notes and opinions. And there are presents as well! Check them out and leave your birthday wishes: https://www.wikidata.org/wiki/Wikidata:Second_Birthday
Cheers Lydia
Happy birthday! I wrote a little note about it on my blog: https://editandowikimedia.wordpress.com/2014/10/29/wikidata-2-anos/
Cheers! - -- Allan J. Aguilar ralgis@vmail.me - ralgis@freenode - allan@jabber.ccc.de PGP: B387 F3B1 0F2C F46B 36AD FAFF 7BC3 594D F7C0 E1A3 OTR: E95CB6E6 22751983 CA8F3F67 3DFACBFF 0FA3A1BC userralgis@Twitter - User:Ralgis@Wikimedia https://editandowikimedia.wordpress.com https://libredebian.wordpress.com https://revistasifra.wordpress.com/