Hi all!
First of all, let me say that we all love the SPARQL endpoint, it's a great service and it has become essential to how we interact with Wikidata and run our bots. Great job by Stas and others!
I am also aware that it is still in beta mode. There is just one issue, which plagues us and I have filed a bug report regarding that in Sep 2015 (https://phabricator.wikimedia.org/T112397), so the issue got alleviated, but it turned out that it did not get fully resolved:
-Occassionally, data written to an item in Wikidata via the API does not make it into the triple store. (Frequency of the issue is hard to determine) -It is a crucial issue because it can lead to data inconsistency by creating duplicate items or incorrect properties/values on items. -It seems to happen while the SPARQL endpoint is under high load (just my impression)
How data is affected: -New data does not make it into the triple store -Updates to and merges of items do not make it to the triple store, so 'ghost items' are returned which have actually been merged or queries show/miss resutls/items incorreclty because freshly added/deleted data has not been completely serialized.
Example: item https://www.wikidata.org/wiki/Q416356, a protein, recently got added protein domains via the 'has part' property. This did not show up in SPARQL queries and a DESCRIBE query for that item returned that these triples were not there indeed. (item has been modified, so it is fine now.)
A solution seems to be to modify the item as this seems to trigger re-serialization. But this is certainly not practical for larger imports. Furthermore, as long as such an item does not get modified, data could be missing/ghosting from/in the triple store for weeks or even months. And it turns out to be quite difficult to determine how much of a certain import effort finally made it into the triple store, if you do not want to iterate through all items modified and check if everything is in the triple store, which would take significant amounts of time.
Could you maybe give us more info on the status of this issue and if we could do something to help alleviating it?
Thank you!
Sebastian (sebotic)