[+Ruben Verborgh]
Salut Maxime,
I wonder if this is something Ruben's Linked Data Fragments (http://linkeddatafragments.org/) could solve in a fast enough manner?! I let Ruben chime in (if he wants).
Cheers, Tom
On Thu, Apr 28, 2016 at 12:26 PM, Maxime Lathuilière groups@maxlath.eu wrote:
Hello!
Context: For the needs of inventaire.io, I'm working on a type-filtered autocomplete, that is, a field with suggestions but with suggestions matching a given claim, typically an "author" input where I would like to suggest only entities that match the claim P31:Q5 (instance of -> human).
The dream would be to have "filter" option in the wbsearchentities module, to be able to do things like https://www.wikidata.org/w/api.php?action=wbsearchentities&limit=10&...
As far as I know, this isn't possible yet. One could search without filter, then fetch the related entities with their claims data, then filter on those claims, but this is rather slow for such an autocomplete feature that needs to be snappy. So the alternative approach I have been working on to is to get a subset of a Wikidata dump and put it in an ElasticSearch instance.
Question: What is the best way to get all the entities matching a given claim? My answer so far was downloading a dump, then filtering the entities by claim, but are there better/less resource-intensive ways? The only other alternative I see would be a SPARQL query without specifying a LIMIT (which in the case of P31:Q5 is probably in the millions(?)) to get all the desired ids, then using wbgetentities to get the data 50 by 50 to work around the API limitations, but those limitations are there for something right? As those who manage the servers that would be stressed by one or the other way, what seems the less painful to recommend? ^^
Thanks in advance for any clue!
New tools:
- To make a filtered dump, I wrote a small command-line tool:
wikidata-filter It can filter a dump but also any set of Wikidata entities in a newline-delimited json file, hope it can be helpful to other people!
- The whole search engine setup can be found here:
wikidata-subset-search-engine
Clues and comments welcome!
Greetings,
Maxime
-- Maxime Lathuilière maxlath.eu - twitter inventaire.io - roadmap - code - twitter - facebook wiki(pedia|data): Zorglub27 for personal emails use max@maxlath.eu instead
Hi Maxime,
(@Tom, thanks for pinging me.)
We have created a self-describing interface for literal search, which can be used for autocompletion. More details here: http://ruben.verborgh.org/publications/vanherwegen_iswc_2015/
Let me know if we can help you!
Best,
Ruben
Hi!
feature that needs to be snappy. So the alternative approach I have been working on to is to get a subset of a Wikidata dump and put it in an ElasticSearch instance.
The linked data fragments implementation would probably be useful for that, and I think it would be good idea to get one eventually for the Wikidata Query Service, but not yet. Also, we do have ElasticSearch index for Wikidata (that's what drives search on site) so it would be possible to integrate it with Query Service too (there's some support for it in Blazegraph) but it's still not done. So for now I think we don't have a ready-made solution yet. You could still try to prefix-search or regex-search on the query service, but depending on the query it may be too slow right now.
*Question: *What is the best way to get all the entities matching a given claim? My answer so far was downloading a dump, then filtering the entities by claim, but are there better/less resource-intensive ways?
Probably not currently without some outside tools. When we get LDF support, then that may be the way :)
@tom thanks for the connexion!
@ruben interesting! will read the full paper asap :)
@ruben @stas I'm not very familiar with Linked Data Fragments, so any additional links to get a better understanding of how this could help address this usecase is welcome!
Maxime
Le 28/04/2016 23:00, Stas Malyshev a écrit :
Hi!
feature that needs to be snappy. So the alternative approach I have been working on to is to get a subset of a Wikidata dump and put it in an ElasticSearch instance.
The linked data fragments implementation would probably be useful for that, and I think it would be good idea to get one eventually for the Wikidata Query Service, but not yet. Also, we do have ElasticSearch index for Wikidata (that's what drives search on site) so it would be possible to integrate it with Query Service too (there's some support for it in Blazegraph) but it's still not done. So for now I think we don't have a ready-made solution yet. You could still try to prefix-search or regex-search on the query service, but depending on the query it may be too slow right now.
*Question: *What is the best way to get all the entities matching a given claim? My answer so far was downloading a dump, then filtering the entities by claim, but are there better/less resource-intensive ways?
Probably not currently without some outside tools. When we get LDF support, then that may be the way :)
@ruben @stas I'm not very familiar with Linked Data Fragments, so any additional links to get a better understanding of how this could help address this usecase is welcome!
The best way to get started is to try it: http://client.linkeddatafragments.org/ (note that text filtering is not part of this demo)
This covers the most important topics: http://linkeddatafragments.org/in-depth/
Ruben