ZOMFG, the tool that Denny introduced yesterday as a birthday gift is
unbelieavably useful and fun.
Here are a few thoughts I had about it:
I went over all the pages for the Hebrew-English pair. There were only 36,
and that is suspiciously low. Were all the articles in these languages
tested by this tool or only a subset?
Even though almost all of the tool's suggestions were correct It would be
problematic to fix these automatically. There were several types of article
* Unrelated because one of the suggested pages was a disambiguation page
and the other was not. Sometimes there was a link to the correct related
page from the disambig page. If anybody makes a new version, this certainly
should be corrected.
* Related, but with explicit interlanguage links in the articles' source
code. This required old-style interwiki conflict resolution. There was a
surprisingly high number of these. I managed to resolve all the conflicts
manually, but it did take a few minutes for each case. Examples from
en.wikipedia: [[Bombe]], [[Bomba (cryptography)]], [[Diary of a Wimpy
* Related, with a Wikidata item for each page, but without conflicts, so
easily mergeable. This can be done by a bot once it is identified for sure.
Adding links to a page without any language links shows a box to write a
language and a target title, and that's it. Adding a link to a new language
to a page which already has some interlanguage links opens the whole item
page in Wikidata (a whole other website!) and requires scrolling, editing
the links, and in many cases - merging the items manually. The result is
actually the same, so it would be very nice if the second case wouldn't be
That's it for now - I hope somebody finds it useful :)
I finished with Hebrew, and I'm going on to Russian, which has over a
thousand article pairs. IT'S INSANELY FUN.
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
“We're living in pieces,
I want to live in peace.” – T. Moore
2014-10-29 19:56 GMT+02:00 Denny Vrandečić <vrandecic(a)google.com>om>:
as you know, many Googlers are huge fans of Wikipedia. So here’s a little
gift for Wikidata’s second birthday.
Some of my smart colleagues at Google have run a few heuristics and
algorithms in order to discover Wikipedia articles in different languages
about the same topic which are missing language links between the articles.
The results contain more than 35,000 missing links with a high confidence
according to these algorithms. We estimate a precision of about 92+% (i.e.
we assume that less than 8% of those are wrong, based on our evaluation).
The dataset covers 60 Wikipedia language editions.
Here are the missing links, available for download from the WMF labs
The data is published under CC-0.
What can you do with the data? Since it is CC-0, you can do anything you
want, obviously, but here are a few suggestions:
There’s a small tool on WMF labs that you can use to verify the links (it
displays the articles side by side from a language pair you select, and
then you can confirm or contradict the merge):
The tool does not do the change in Wikidata itself, though (we thought it
would be too invasive if we did that). Instead, the results of the human
evaluation are saved on WMF labs. You are welcome to take the tool and
extend it with the possibility to upload the change directly on Wikidata,
if you so wish, or, once the data is verified, to upload the results.
Also, Magnus Manske is already busy uploading the data to the Wikidata
game, so you can very soon also play the merge game on the data directly.
He is also creating the missing items on Wikidata. Thanks Magnus for a very
I want to call out to my colleagues at Google who created the dataset -
Jiang Bian and Si Li - and to Yicheng Huang, the intern who developed the
tool on labs.
I hope that this small data release can help a little with further
improving the quality of Wikidata and Wikipedia! Thank you all, you are
On Wed Oct 29 2014 at 10:52:05 AM Lydia Pintscher <
Hey folks :)
Today Wikidata is turning two. It amazes me what we've achieved in
just 2 years. We've built an incredible project that is set out to
change the world. Thank you everyone who has been a part of this so
We've put together some notes and opinions. And there are presents as
well! Check them out and leave your birthday wishes:
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list
Wikidata-l mailing list