[+Ruben Verborgh]
Salut Maxime,
I wonder if this is something Ruben's Linked Data Fragments
(
http://linkeddatafragments.org/) could solve in a fast enough
manner?! I let Ruben chime in (if he wants).
Cheers,
Tom
On Thu, Apr 28, 2016 at 12:26 PM, Maxime Lathuilière <groups(a)maxlath.eu> wrote:
Hello!
Context:
For the needs of inventaire.io, I'm working on a type-filtered autocomplete,
that is, a field with suggestions but with suggestions matching a given
claim, typically an "author" input where I would like to suggest only
entities that match the claim P31:Q5 (instance of -> human).
The dream would be to have "filter" option in the wbsearchentities module,
to be able to do things like
https://www.wikidata.org/w/api.php?action=wbsearchentities&limit=10&…
As far as I know, this isn't possible yet. One could search without filter,
then fetch the related entities with their claims data, then filter on those
claims, but this is rather slow for such an autocomplete feature that needs
to be snappy. So the alternative approach I have been working on to is to
get a subset of a Wikidata dump and put it in an ElasticSearch instance.
Question:
What is the best way to get all the entities matching a given claim?
My answer so far was downloading a dump, then filtering the entities by
claim, but are there better/less resource-intensive ways?
The only other alternative I see would be a SPARQL query without specifying
a LIMIT (which in the case of P31:Q5 is probably in the millions(?)) to get
all the desired ids, then using wbgetentities to get the data 50 by 50 to
work around the API limitations, but those limitations are there for
something right?
As those who manage the servers that would be stressed by one or the other
way, what seems the less painful to recommend? ^^
Thanks in advance for any clue!
New tools:
- To make a filtered dump, I wrote a small command-line tool:
wikidata-filter
It can filter a dump but also any set of Wikidata entities in a
newline-delimited json file, hope it can be helpful to other people!
- The whole search engine setup can be found here:
wikidata-subset-search-engine
Clues and comments welcome!
Greetings,
Maxime
--
Maxime Lathuilière
maxlath.eu - twitter
inventaire.io - roadmap - code - twitter - facebook
wiki(pedia|data): Zorglub27
for personal emails use max(a)maxlath.eu instead
--
Dr. Thomas Steiner, Employee (
http://blog.tomayac.com,
https://twitter.com/tomayac)
Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle
Registration office and registration number: Hamburg, HRB 86891
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.29 (GNU/Linux)
iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom
hTtPs://xKcd.cOm/1181/
-----END PGP SIGNATURE-----