Hi all -
We've been using a locally installed wikidata stand-alone service (https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Standalone...) for several months now. Recently the service went down for a significant amount of time, and when we ran runUpdate.sh -n wdq, instead of catching up to real time as it usually does, the update process lagged, failing even to keep parity with real time.
Example output from the log:
09:30:39.805 [main] INFO org.wikidata.query.rdf.tool.Update - Polled up to 2016-10-24T23:01:05Z at (0.0, 0.0, 0.0) updates per second and (271.8, 56.2, 18.8) milliseconds per second
This is normal when starting the update of course, but the system never seems to find its feet, and continues to stumble and lag. Restarting both the blazegraph process and the update process has no lasting effect.
From time to time, a message like this will appear:
INFO org.wikidata.query.rdf.tool.RdfRepository - HTTP request failed: org.apache.http.NoHttpResponseException: wikidata.cb.ntent.com:9999 failed to respond, retrying in 2175 ms.
I have experienced this effect in the past, and had success replacing an old journal which was the product of a long update process with a new journal rebuilt from the latest dump. This strategy did not work. I tried rebuilding with the latest git pull from origin and rebuilding the journal, again with no effect.
This problem started about 3 days ago, and we're now polling up to a point in time 18 hours earlier than real time.
I would appreciate any guidance.
Also: is this an appropriate list to write to with such problems? Are there more appropriate places?
Thanks,
Eric Scott
Stas: could you please have a look?
Cheers Lydia
On Oct 28, 2016 19:06, "Eric Scott" eric.d.scott@att.net wrote:
Hi all -
We've been using a locally installed wikidata stand-alone service ( https://www.mediawiki.org/wiki/Wikidata_query_service/User_ Manual#Standalone_service) for several months now. Recently the service went down for a significant amount of time, and when we ran runUpdate.sh -n wdq, instead of catching up to real time as it usually does, the update process lagged, failing even to keep parity with real time.
Example output from the log:
09:30:39.805 [main] INFO org.wikidata.query.rdf.tool.Update - Polled up to 2016-10-24T23:01:05Z at (0.0, 0.0, 0.0) updates per second and (271.8, 56.2, 18.8) milliseconds per second
This is normal when starting the update of course, but the system never seems to find its feet, and continues to stumble and lag. Restarting both the blazegraph process and the update process has no lasting effect.
From time to time, a message like this will appear:
INFO org.wikidata.query.rdf.tool.RdfRepository - HTTP request failed: org.apache.http.NoHttpResponseException: wikidata.cb.ntent.com:9999 failed to respond, retrying in 2175 ms.
I have experienced this effect in the past, and had success replacing an old journal which was the product of a long update process with a new journal rebuilt from the latest dump. This strategy did not work. I tried rebuilding with the latest git pull from origin and rebuilding the journal, again with no effect.
This problem started about 3 days ago, and we're now polling up to a point in time 18 hours earlier than real time.
I would appreciate any guidance.
Also: is this an appropriate list to write to with such problems? Are there more appropriate places?
Thanks,
Eric Scott
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Hi!
We've been using a locally installed wikidata stand-alone service (https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Standalone...) for several months now. Recently the service went down for a significant amount of time, and when we ran runUpdate.sh -n wdq, instead of catching up to real time as it usually does, the update process lagged, failing even to keep parity with real time.
Hmm... This usually means that the Blazegaph install is underpowered and the queries for update can't run in time. Try increasing batch size, maybe, but usually that doesn't change much, if the host is not performant enough to keep with the data.
INFO org.wikidata.query.rdf.tool.RdfRepository - HTTP request failed: org.apache.http.NoHttpResponseException: wikidata.cb.ntent.com:9999 failed to respond, retrying in 2175 ms.
Do you have any other exceptions surrounding it, or any accompanying exceptions on Blazegraph side?
This problem started about 3 days ago, and we're now polling up to a point in time 18 hours earlier than real time.
It also can happen if the edit volume spikes, and then it should catch up when the spike passes. But if that's not the case, I'd try to run Blazegraph on stronger machine.
Also: is this an appropriate list to write to with such problems? Are there more appropriate places?
Blazegraph list could help too, for BG-specific questions: Bigdata-developers@lists.sourceforge.net There is a good platform to discuss performance/optimization questions.
Thanks for your response. Actually the systems it's being run on are pretty well equipped with multiple cores and plenty of memory.
I believe the problem arose from the fact that the rccontinue parameter is not being carried forward from previous calls to the wikibase API. Refactoring the code to do so seems to have fixed the problem.
Cheers,
On 11/02/2016 11:57 AM, Stas Malyshev wrote:
Hi!
We've been using a locally installed wikidata stand-alone service (https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual#Standalone...) for several months now. Recently the service went down for a significant amount of time, and when we ran runUpdate.sh -n wdq, instead of catching up to real time as it usually does, the update process lagged, failing even to keep parity with real time.
Hmm... This usually means that the Blazegaph install is underpowered and the queries for update can't run in time. Try increasing batch size, maybe, but usually that doesn't change much, if the host is not performant enough to keep with the data.
INFO org.wikidata.query.rdf.tool.RdfRepository - HTTP request failed: org.apache.http.NoHttpResponseException: wikidata.cb.ntent.com:9999 failed to respond, retrying in 2175 ms.
Do you have any other exceptions surrounding it, or any accompanying exceptions on Blazegraph side?
This problem started about 3 days ago, and we're now polling up to a point in time 18 hours earlier than real time.
It also can happen if the edit volume spikes, and then it should catch up when the spike passes. But if that's not the case, I'd try to run Blazegraph on stronger machine.
Also: is this an appropriate list to write to with such problems? Are there more appropriate places?
Blazegraph list could help too, for BG-specific questions: Bigdata-developers@lists.sourceforge.net There is a good platform to discuss performance/optimization questions.
wikidata-tech@lists.wikimedia.org