What you are encountering here, is a major bottleneck and timesuck for any data import
into Wikidata. Matching external lists of concepts (names of people, places, buildings,
whatever) from external datasets correctly with the right Wikidata items is a thing that
always takes me hours and hours and hours of work.
In order to solve it, we need a working and user-friendly reconciliation tool that is
integrated into a common data management platform (i.e. OpenRefine, and would also be
fantastic to have it for Google Spreadsheets).
Magnus has developed a basic API for it
<https://tools.wmflabs.org/wikidata-reconcile/>, but a working and user-friendly
interface in one of those tools mentioned above is the missing link.
I want to emphasize again that there is a bounty (money!) to be earned
<https://www.bountysource.com/issues/985941-implement-wikidata-reconciliation-was-freebase>
for those who develop this for OpenRefine.
I have outlined the task in Phabricator too.
https://phabricator.wikimedia.org/T146740
<https://phabricator.wikimedia.org/T146740>
Just putting this out here to give it attention again. It is such an important missing
link in the workflow of anyone who wants to import data into Wikidata.
I’m so desperate for it that I’m considering to collect funding and then hire an external
developer to make it, but of course it would be best if it would be developed and
maintained from within our community ;-)
Greetings, Sandra
On 13 Oct 2016, at 11:16, Markus Bärlocher
<markus.baerlocher(a)lau-net.de> wrote:
Hi Tom,
This is a lighthouse case for my Google Sheets
add-on
Great tool - thanks!
And more great tools included there :-)
just add new terms to the "Terms"
column, everything else fills automagically.
I checked the first results by hand:
30% of the found WP-articles are specific helpful
70% of the URLs lead to not concordant content
My idea:
A "reliability index" may be could help?
(1. handy approved accordance of Term and WP-article)
2. Term and Lemma identical
3. Term and section title identical
4. all words in Term found in Lemma
5. all words in Term found in section title
6. Term found as string in article text
But I have no idea how to do this myself:
https://github.com/tomayac/wikipedia-tools-for-google-spreadsheets/issues/11
Best regards,
Markus
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata