We currently rely on the Wikidata Query API to identify whether or not a set of claims exists on a given property. Some of our previous bot runs has created duplicates since recent additions didn't make it to the WDQ API yet.
In our efforts to prevent the creation of duplicate entries, I am trying to better understand the WDQ-api.
The documentation of the WDQ-api states that [1] "Also, the data used here is from WikiData "dumps", so it can be a few hours old.". However, when I check on the datadumps they are either updated weekly with json dumps or incremental daily dumps as xml [2].
Also, sometimes the WDQ-api seems to have instant behaviour with claims being added, in the sense that they are immediately available through the WDQ API.
How often is the WDQ api really being updated? Is it possible to query wikidata live, with WDQ and if not, are there alternatives that would allow this?
Regards,
Andra Waagmeester
[1] https://wdq.wmflabs.org/api_documentation.html [2] https://www.wikidata.org/wiki/Wikidata:Database_download
Hi there,
Are you aware of the &revision URL parameter? Last paragraph of https://www.wikidata.org/wiki/Wikidata:Data_access#Linked_Data_interface. This hopefully should help.
Cheers, Tom
Hi!
How often is the WDQ api really being updated? Is it possible to query wikidata live, with WDQ and if not, are there alternatives that would allow this?
We currently have SPARQL query service in beta[1], which is updated constantly from Wikidata. Note that since it's beta it is not stable yet both operationally and data-model-wise, so please be aware of this, also it has timeout limits that won't allow you for now to run queries that are too complex. But if you want to check it out and see if that fits your use case you are most welcome.
[1] http://wdqs-beta.wmflabs.org/
Hi Stas,
I have seen that SPARQL query service and it indeed is an interesting alternative. In terms of stability and update frequency how different is the SPARQL query service from the Wikidata Query API?
Cheers, Andra
Andra
On Thu, Jun 18, 2015 at 9:20 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
How often is the WDQ api really being updated? Is it possible to query wikidata live, with WDQ and if not, are there alternatives that would allow this?
We currently have SPARQL query service in beta[1], which is updated constantly from Wikidata. Note that since it's beta it is not stable yet both operationally and data-model-wise, so please be aware of this, also it has timeout limits that won't allow you for now to run queries that are too complex. But if you want to check it out and see if that fits your use case you are most welcome.
[1] http://wdqs-beta.wmflabs.org/
Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi!
I have seen that SPARQL query service and it indeed is an interesting alternative. In terms of stability and update frequency how different is the SPARQL query service from the Wikidata Query API?
In terms of stability: it's beta, so while we try to keep it up and running smoothly, it is not out of the question that it can be taken down at any moment, either because we found a bug or because we need to update something, and the data model can change too. We do not expect substantial changes in data model anymore, and we try to keep it up and running (doesn't help that we are in the middle of large labs outage right now: https://wikitech.wikimedia.org/wiki/Incident_documentation/20150617-LabsNFSO... ) and synched continuously (i.e. no more than minutes behind wikidata edits), but as long as it's beta we can give no guarantees on anything. We're working hard to make it production-quality, but that will take a bit more time.
The differences between WDQ and WDQS/SPARQL is that SPARQL is a full-features language for querying triple-based (RDF) data sets, and allows very complex queries. It is also a standard in linked data world. You can use the translator (http://tools.wmflabs.org/wdq2sparql/w2s.php) - once the labs outage ends of course - to convert between WDQ syntax and SPARQL. Also check out other links on the WDQS beta page for short intros about how things are done with SPARQL and examples of which queries you can run.
Hi Stas,
Thanks for the suggestion on the SPARQL endpoint. I have tested it a bit and I must say I am excited on the potential it has. I tend to run SPARQL queries directly from my desktop, through an IDE (i.e. textmate) However, I haven't managed to connect to the wdqs this way. It seems that I need to push the execute button in a browser to get the actual results.
Is it possible to get the SPARQL results in an API manner?
Regards
On Thu, Jun 18, 2015 at 9:20 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
How often is the WDQ api really being updated? Is it possible to query wikidata live, with WDQ and if not, are there alternatives that would allow this?
We currently have SPARQL query service in beta[1], which is updated constantly from Wikidata. Note that since it's beta it is not stable yet both operationally and data-model-wise, so please be aware of this, also it has timeout limits that won't allow you for now to run queries that are too complex. But if you want to check it out and see if that fits your use case you are most welcome.
[1] http://wdqs-beta.wmflabs.org/
Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Andra,
Am 14.07.2015 um 19:05 schrieb Andra Waagmeester:
Hi Stas,
Thanks for the suggestion on the SPARQL endpoint. I have tested it
a bit and I must say I am excited on the potential it has. I tend to run SPARQL queries directly from my desktop, through an IDE (i.e. textmate) However, I haven't managed to connect to the wdqs this way. It seems that I need to push the execute button in a browser to get the actual results.
Is it possible to get the SPARQL results in an API manner?
Sure it is, the direct URL to the endpoint is wdqs-beta.wmflabs.org/bigdata/namespace/wdq/sparql and you can submit your query via a GET request using the `query` parameter.
Best regards Bene
Sorry, realizing only now that this is for the Query API, not the Linked Data interface. My bad, please ignore my previous reply.
On 18.06.2015 21:40, Thomas Steiner wrote:
Sorry, realizing only now that this is for the Query API, not the Linked Data interface. My bad, please ignore my previous reply.
Maybe it would still be a good idea for bots to do a check against the live json data right before the edit. Checking live json is an additional step but should hardly slow down bots (which are throttled anyway).
The way that updates work *in all systems* (polling small lists of recent changes at intervals and hoping that this leads to a complete change history), it seems quite possible that such systems will sometimes miss an update, at least in the long run and under varying conditions (high server load, network troubles, update script down for a while, whatever). Insufficient update frequency is maybe not the biggest problem here (it should be in the range of one to a few minutes for all of the services).
Regards,
Markus
Hi!
The way that updates work *in all systems* (polling small lists of recent changes at intervals and hoping that this leads to a complete change history), it seems quite possible that such systems will sometimes miss an update, at least in the long run and under varying conditions (high server load, network troubles, update script down for a while, whatever). Insufficient update frequency is maybe not the biggest problem here (it should be in the range of one to a few minutes for all of the services).
Very important point with which I agree - it is completely possible that update polling misses an update, WDQS is no exception and it usually does not treat it as a problem, as the next update can fill up the missed one. However the ultimate truth source is on the wikidata site only. Beware of the caches though - if you ask for the same data on the same URL twice, I think you can get the same result even if the underlying data changed in the meantime.
Indeed the ultimate truth source is on the wikidata site it self. However, I am not aware of a way to query the Wikidata site for a list of items fitting a certain condition (e.g. all Wikidata items containing a claim with the NCBI Entrez Gene (P351) property.)
It is here that I need to rely on WDQ (and WDQS) and potentially risk missing existing items due to delays in which WDQ (and WDAS) gets updated.
I would like to know if I could rely on a given time frame - being it seconds, hours, days, or one week).
I currently assume a delay of a week, but I don't know how accurate this assumption is.
Regards,
On Thu, Jun 18, 2015 at 10:23 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
The way that updates work *in all systems* (polling small lists of recent changes at intervals and hoping that this leads to a complete change history), it seems quite possible that such systems will sometimes miss an update, at least in the long run and under varying conditions (high server load, network troubles, update script down for a while, whatever). Insufficient update frequency is maybe not the biggest problem here (it should be in the range of one to a few minutes for all of the services).
Very important point with which I agree - it is completely possible that update polling misses an update, WDQS is no exception and it usually does not treat it as a problem, as the next update can fill up the missed one. However the ultimate truth source is on the wikidata site only. Beware of the caches though - if you ask for the same data on the same URL twice, I think you can get the same result even if the underlying data changed in the meantime.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Just ask WDQ for a list of item ids, then pass them to the live API https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities. You may miss some recently edited items, but at least you wouldn't base your edits upon outdated revisions (and wbeditentity https://www.wikidata.org/w/api.php?action=help&modules=wbeditentity's "baserevid" argument completely eliminates the risk).
Il 19/06/2015 00:10, Andra Waagmeester ha scritto:
Indeed the ultimate truth source is on the wikidata site it self. However, I am not aware of a way to query the Wikidata site for a list of items fitting a certain condition (e.g. all Wikidata items containing a claim with the NCBI Entrez Gene (P351) property.)
It is here that I need to rely on WDQ (and WDQS) and potentially risk missing existing items due to delays in which WDQ (and WDAS) gets updated.
I would like to know if I could rely on a given time frame - being it seconds, hours, days, or one week).
I currently assume a delay of a week, but I don't know how accurate this assumption is.
Regards,
On Thu, Jun 18, 2015 at 10:23 PM, Stas Malyshev <smalyshev@wikimedia.org mailto:smalyshev@wikimedia.org> wrote:
Hi! > The way that updates work *in all systems* (polling small lists of > recent changes at intervals and hoping that this leads to a complete > change history), it seems quite possible that such systems will > sometimes miss an update, at least in the long run and under varying > conditions (high server load, network troubles, update script down for a > while, whatever). Insufficient update frequency is maybe not the biggest > problem here (it should be in the range of one to a few minutes for all > of the services). Very important point with which I agree - it is completely possible that update polling misses an update, WDQS is no exception and it usually does not treat it as a problem, as the next update can fill up the missed one. However the ultimate truth source is on the wikidata site only. Beware of the caches though - if you ask for the same data on the same URL twice, I think you can get the same result even if the underlying data changed in the meantime. -- Stas Malyshev smalyshev@wikimedia.org <mailto:smalyshev@wikimedia.org> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata is a wiki like any other, so you can just click "What links here" on Property:P351 and pick up all items that way
On Fri, Jun 19, 2015 at 9:14 AM, Ricordisamoa ricordisamoa@openmailbox.org wrote:
Just ask WDQ for a list of item ids, then pass them to the live API https://www.wikidata.org/w/api.php?action=help&modules=wbgetentities. You may miss some recently edited items, but at least you wouldn't base your edits upon outdated revisions (and wbeditentity https://www.wikidata.org/w/api.php?action=help&modules=wbeditentity's "baserevid" argument completely eliminates the risk).
Il 19/06/2015 00:10, Andra Waagmeester ha scritto:
Indeed the ultimate truth source is on the wikidata site it self. However, I am not aware of a way to query the Wikidata site for a list of items fitting a certain condition (e.g. all Wikidata items containing a claim with the NCBI Entrez Gene (P351) property.)
It is here that I need to rely on WDQ (and WDQS) and potentially risk missing existing items due to delays in which WDQ (and WDAS) gets updated.
I would like to know if I could rely on a given time frame - being it seconds, hours, days, or one week).
I currently assume a delay of a week, but I don't know how accurate this assumption is.
Regards,
On Thu, Jun 18, 2015 at 10:23 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
The way that updates work *in all systems* (polling small lists of recent changes at intervals and hoping that this leads to a complete change history), it seems quite possible that such systems will sometimes miss an update, at least in the long run and under varying conditions (high server load, network troubles, update script down for a while, whatever). Insufficient update frequency is maybe not the biggest problem here (it should be in the range of one to a few minutes for all of the services).
Very important point with which I agree - it is completely possible that update polling misses an update, WDQS is no exception and it usually does not treat it as a problem, as the next update can fill up the missed one. However the ultimate truth source is on the wikidata site only. Beware of the caches though - if you ask for the same data on the same URL twice, I think you can get the same result even if the underlying data changed in the meantime.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata