Hi Magnus,
the idea was not to search for all labels/synonyms separately, but to concatenate everything in one large search string, and let the fulltext search do the magic.
E.g., for STW descriptor “CGE model”, search for “CGE model, CGE-Modell, ORANI model, MONASH model, Dynamic CGE model, Computable general equilibrium model, CGE analysis, Applied general equilibrium model”
When, as in Fuseki, the fulltext search tries to match every word in the string, it may return long lists of results. However: When these can be sorted by a score value, they can be limited to the best matching 10 or whatever results.
An according example query, which works on a GND endpoint, is here: http://zbw.eu/beta/sparql-lab/?endpoint=http://zbw.eu/beta/sparql/gnd/query&... I’m pretty sure, that would work as well on our currently unavailable internal WD endpoint on Fuseki. Unfortunately, MWAPI fulltext search seems to work differently.
Another pattern, which I have applied with a query which looks up person names and their name variants from GND, and then searches in the above mentioned custom WD instance, is here: https://github.com/zbw/sparql-queries/blob/master/wikidata/search_person_by_....
For, e.g., “John H. Dunning” (http://d-nb.info/gnd/119094665) all name variants are bound in a fulltext search expression, and a sum of scores is computed to rank the total result (http://zbw.eu/beta/sparql-lab/result?resultRef=https://api.github.com/repos/...).
I have experimented a bit, but neither of these patterns seems to work with the current MWAPI implementation. Since my understanding is very poor here, and the implementation is in an early stage, I cc Stas, who perhaps can contribute ideas.
Cheers, Joachim
Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag von Magnus Manske Gesendet: Mittwoch, 14. Juni 2017 09:33 An: Discussion list for the Wikidata project. Betreff: Re: [Wikidata] Multilingual and synonym support for M'n'm / was: Mix'n'Match with existing (indirect) mappings
On Tue, Jun 13, 2017 at 6:25 PM Neubert, Joachim <J.Neubert@zbw.eumailto:J.Neubert@zbw.eu> wrote: Hi Magnus, Osma,
I suppose the scenario Osma pointed out is quite common for knowledge organization systems and in particular thesauri: Matching could take advantage of multilingual labels and also of synonyms, which are defined in the KOS.
For the populating STW Thesaurus for Economics ID (P3911), my preliminary plan was to match with all multilingual labels and synonyms as search string in a custom WD endpoint (Fuseki, with full text search support), and display in the ranked SPARQL results of the search with a column with a valid insert statement that can be copied and pasted into QuickStatements2.
Since Stas just announced an extension for WDQS with fulltext search (if I haven’t misunderstood his mail of 2017-06-12), it is perhaps now possible to do this kind of matching in WDQS.
It would be great if such an extended matching could be integrated into M’n’m. To clarify, Mix'n'match already searches language-neutral, e.g. for automatch.
Storing multiple labels per entry in the Mix'n'match database, and then checking all-against-all, would require some large-scale rewiring.