Hi Antonin,

mix'n'match is designed to work with almost any dataset, thus uses the common denominator, which is names, for matching.

There are mechanisms to match on other properties, but writing an interface for public consumption for this would be a task that could easily keep an entire team of programmers busy :-)

If you can give me the whole list to download, I will see what I can do in terms of auxiliary data matching. Maybe a combination of that, manual matches (or at least confirmations on name matches), and the OpenRefine approach will give us maximum coverage.

It appears Kunstenpunt has no Wikidata property yet. Maybe Romaine could star setting one up? That would help in terms of synchronisation, I believe.

Cheers,
Magnus



On Thu, Jan 26, 2017 at 4:44 PM Antonin Delpeuch (lists) <lists@antonin.delpeuch.eu> wrote:
Hi Magnus,

Mix'n'match looks great and I do have a few questions about it. I'd like
to use it to import a dataset, which looks like this (these are the 100
first lines):
http://pintoch.ulminfo.fr/34f8c4cf8a/aligned_institutions.txt

I see how to import it in Mix'n'match, but given all the columns I have
in this dataset, I think that it is a bit sad to resort to matching on
the name only.

Do you see any way to do some fuzzy-matching on, say, the URLs provided
in the dataset against the "official website" property? I think that it
would be possible with the (proposed) Wikidata interface for OpenRefine
(if I understand the UI correctly).

In this context, I think it might even be possible to confirm matches
automatically (when the matches are excellent on multiple columns). As
the dataset is rather large (400,000 lines) I would not really want to
validate them one after the other with the web interface. So I would
need a sort of batch edit. How would you do that?

Finally, once matches are found, it would be great if statements
corresponding to the various columns could be created in the items (if
these statements don't already exist). With the appropriate reference to
the dataset, ideally.

I realise this is a lot to ask - maybe I should just write a bot.

Alina, sorry to hijack your thread. I hope my questions were general
enough to be interesting for other readers.

Cheers,
Antonin


On 26/01/2017 16:01, Magnus Manske wrote:
> If you want to match your list to Wikidata, to find which entries
> already exist, have you considered Mix'n'match?
> https://tools.wmflabs.org/mix-n-match/
>
> You can upload your names and identifiers at
> https://tools.wmflabs.org/mix-n-match/import.php
>
> There are several mechanisms in place to help with the matching. Please
> contact me if you need help!
>
> On Thu, Jan 26, 2017 at 3:58 PM Magnus Manske
> <magnusmanske@googlemail.com <mailto:magnusmanske@googlemail.com>> wrote:
>
>     Alina, I just found your bug report, which you filed under the wrong
>     issue tracker. The git repo (source code, issue tracker etc.) are here:
>     https://bitbucket.org/magnusmanske/reconcile
>
>     The report says it "keeps hanging", which is so vague that it's
>     impossible to debug, especially since the example linked on
>     https://tools.wmflabs.org/wikidata-reconcile/
>     works perfectly fine for me.
>
>     Does it not work at all for you? Does it work for a time, but then
>     stops? Does it "break" reproducibly on specific queries, or at
>     random? Maybe it breaks for specific "types" only? At what rate are
>     you hitting the tool? Do you have an example query, preferably one
>     that breaks?
>
>     Please note that this is not an "official" WMF service, only parts
>     of the API are implemented, and there are currently other technical
>     limitations on it.
>
>     Cheers,
>     Magnus
>
>     On Thu, Jan 26, 2017 at 3:35 PM Antonin Delpeuch (lists)
>     <lists@antonin.delpeuch.eu <mailto:lists@antonin.delpeuch.eu>> wrote:
>
>         Hi,
>
>         I'm also very interested in this. How did you configure your
>         OpenRefine
>         to use Wikidata? (Even if it does not currently work, I am
>         interested in
>         the setup.)
>
>         There is currently an open issue (with a nice bounty) to improve the
>         integration of Wikidata in OpenRefine:
>         https://github.com/OpenRefine/OpenRefine/issues/805
>
>         Best regards,
>         Antonin
>
>         On 26/01/2017 12:22, Alina Saenko wrote:
>         > Hello everyone,
>         >
>         > I have a question for people who are using the Wikidata
>         reconciliation
>         > service: https://tools.wmflabs.org/wikidata-reconcile/ It was
>         working
>         > perfectly in my Open Refine in november 2016, but since
>         december is
>         > stopped working. I already have contacted Magnus Manske, but
>         he hasn’t
>         > responded yet. Does anyone else experience problems with the
>         service and
>         > know how to fix it?
>         >
>         > I’m using this service to link big lists of Belgian artists
>         (37.000) and
>         > performance art organisations (1.000) to Wikidata as a
>         preparation to
>         > upload contextual data about these persons and organisations to
>         > Wikidata. This data wil come from Kunstenpunt database
>         > (http://data.kunsten.be/people). Wikimedia user Romaine
>         > (https://meta.wikimedia.org/wiki/User:Romaine) is helping us
>         with this
>         > project.
>         >
>         > Best regards,
>         > Alina
>         >
>         >
>         > --
>         > Aanwezig ma, di, wo, do
>         >
>         > PACKED vzw - Expertisecentrum Digitaal Erfgoed
>         > Rue Delaunoystraat 58 bus 23
>         > B-1080 Brussel
>         > Belgium
>         >
>         > e alina@packed.be <mailto:alina@packed.be>
>         <mailto:alina@packed.be <mailto:alina@packed.be>>
>         > t: +32 (0)2 217 14 05 <tel:+32%202%20217%2014%2005>
>         > w www.packed.be <http://www.packed.be> <http://www.packed.be/>
>         >
>         >
>         >
>         > _______________________________________________
>         > Wikidata mailing list
>         > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>         > https://lists.wikimedia.org/mailman/listinfo/wikidata
>         >
>
>
>         _______________________________________________
>         Wikidata mailing list
>         Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
>         https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata