Hi Magnus,
the idea was not to search for all labels/synonyms separately, but to concatenate
everything in one large search string, and let the fulltext search do the magic.
E.g., for STW descriptor “CGE model”, search for “CGE model, CGE-Modell, ORANI model,
MONASH model, Dynamic CGE model, Computable general equilibrium model, CGE analysis,
Applied general equilibrium model”
When, as in Fuseki, the fulltext search tries to match every word in the string, it may
return long lists of results. However: When these can be sorted by a score value, they can
be limited to the best matching 10 or whatever results.
An according example query, which works on a GND endpoint, is here:
http://zbw.eu/beta/sparql-lab/?endpoint=http://zbw.eu/beta/sparql/gnd/query…
I’m pretty sure, that would work as well on our currently unavailable internal WD endpoint
on Fuseki. Unfortunately, MWAPI fulltext search seems to work differently.
Another pattern, which I have applied with a query which looks up person names and their
name variants from GND, and then searches in the above mentioned custom WD instance, is
here:
https://github.com/zbw/sparql-queries/blob/master/wikidata/search_person_by….
For, e.g., “John H. Dunning” (
http://d-nb.info/gnd/119094665) all name variants are bound
in a fulltext search expression, and a sum of scores is computed to rank the total result
(
http://zbw.eu/beta/sparql-lab/result?resultRef=https://api.github.com/repos…).
I have experimented a bit, but neither of these patterns seems to work with the current
MWAPI implementation. Since my understanding is very poor here, and the implementation is
in an early stage, I cc Stas, who perhaps can contribute ideas.
Cheers, Joachim
Von: Wikidata [mailto:wikidata-bounces@lists.wikimedia.org] Im Auftrag von Magnus Manske
Gesendet: Mittwoch, 14. Juni 2017 09:33
An: Discussion list for the Wikidata project.
Betreff: Re: [Wikidata] Multilingual and synonym support for M'n'm / was:
Mix'n'Match with existing (indirect) mappings
On Tue, Jun 13, 2017 at 6:25 PM Neubert, Joachim
<J.Neubert@zbw.eu<mailto:J.Neubert@zbw.eu>> wrote:
Hi Magnus, Osma,
I suppose the scenario Osma pointed out is quite common for knowledge organization systems
and in particular thesauri: Matching could take advantage of multilingual labels and also
of synonyms, which are defined in the KOS.
For the populating STW Thesaurus for Economics ID (P3911), my preliminary plan was to
match with all multilingual labels and synonyms as search string in a custom WD endpoint
(Fuseki, with full text search support), and display in the ranked SPARQL results of the
search with a column with a valid insert statement that can be copied and pasted into
QuickStatements2.
Since Stas just announced an extension for WDQS with fulltext search (if I haven’t
misunderstood his mail of 2017-06-12), it is perhaps now possible to do this kind of
matching in WDQS.
It would be great if such an extended matching could be integrated into M’n’m.
To clarify, Mix'n'match already searches language-neutral, e.g. for automatch.
Storing multiple labels per entry in the Mix'n'match database, and then checking
all-against-all, would require some large-scale rewiring.