Hello!

Context:
For the needs of inventaire.io, I'm working on a type-filtered autocomplete, that is, a field with suggestions but with suggestions matching a given claim, typically an "author" input where I would like to suggest only entities that match the claim P31:Q5 (instance of -> human).

The dream would be to have "filter" option in the wbsearchentities module, to be able to do things like
https://www.wikidata.org/w/api.php?action=wbsearchentities&limit=10&format=json&search=victor&filter=P31:Q5

As far as I know, this isn't possible yet. One could search without filter, then fetch the related entities with their claims data, then filter on those claims, but this is rather slow for such an autocomplete feature that needs to be snappy. So the alternative approach I have been working on to is to get a subset of a Wikidata dump and put it in an ElasticSearch instance.

Question:
What is the best way to get all the entities matching a given claim?
My answer so far was downloading a dump, then filtering the entities by claim, but are there better/less resource-intensive ways?
The only other alternative I see would be a SPARQL query without specifying a LIMIT (which in the case of P31:Q5 is probably in the millions(?)) to get all the desired ids, then using wbgetentities to get the data 50 by 50 to work around the API limitations, but those limitations are there for something right?
As those who manage the servers that would be stressed by one or the other way, what seems the less painful to recommend? ^^

Thanks in advance for any clue!

New tools:
- To make a filtered dump, I wrote a small command-line tool: wikidata-filter
It can filter a dump but also any set of Wikidata entities in a newline-delimited json file, hope it can be helpful to other people!
- The whole search engine setup can be found here: wikidata-subset-search-engine

Clues and comments welcome!

Greetings,

Maxime

Maxime Lathuilière
maxlath.eu - twitter
inventaire.io - roadmap - code - twitter - facebook
wiki(pedia|data): Zorglub27
for personal emails use max@maxlath.eu instead