On 2/21/23 4:05 PM, Guillaume Lederrey wrote:
> Hello all!
>
> TL;DR: We expect to successfully complete the recent data reload on
> Wikidata Query Service soon, but we've encountered multiple failures
> related to the size of the graph, and anticipate that this issue may
> worsen in the future. Although we succeeded this time, we cannot
> guarantee that future reload attempts will be successful given the
> current trend of the data reload process. Thank you for your
> understanding and patience..
>
> Longer version:
>
> WDQS is updated from a stream of recent changes on Wikidata, with a
> maximum delay of ~2 minutes. This process was improved as part of the
> WDQS Streaming Updater project to ensure data coherence[1] . However,
> the update process is still imperfect and can lead to data
> inconsistencies in some cases[2][3]. To address this, we reload the
> data from dumps a few times per year to reinitialize the system from a
> known good state.
>
> The recent reload of data from dumps started in mid-December and was
> initially met with some issues related to download and instabilities
> in Blazegraph, the database used by WDQS[4]. Loading the data into
> Blazegraph takes a couple of weeks due to the size of the graph, and
> we had multiple attempts where the reload failed after >90% of the
> data had been loaded. Our understanding of the issue is that a "race
> condition" in Blazegraph[5], where subtle timing changes lead to
> corruption of the journal in some rare cases, is to blame.[6]
>
> We want to reassure you that the last reload job was successful on one
> of our servers. The data still needs to be copied over to all of the
> WDQS servers, which will take a couple of weeks, but should not bring
> any additional issues. However, reloading the full data from dumps is
> becoming more complex as the data size grows, and we wanted to let you
> know why the process took longer than expected. We understand that
> data inconsistencies can be problematic, and we appreciate your
> patience and understanding while we work to ensure the quality and
> consistency of the data on WDQS.
>
> Thank you for your continued support and understanding!
>
>
> Guillaume
>
>
> [1] https://phabricator.wikimedia.org/T244590
> [2] https://phabricator.wikimedia.org/T323239
> [3] https://phabricator.wikimedia.org/T322869
> [4] https://phabricator.wikimedia.org/T323096
> [5] https://en.wikipedia.org/wiki/Race_condition#In_software
> [6] https://phabricator.wikimedia.org/T263110
>
Hi Guillaume,
Are there plans to decouple WDQS from the back-end database? Doing that
provides more resilient architecture for Wikidata as a whole since you
will be able to swap and interchange SPARQL-compliant backends.
BTW -- we are going to make AWS and even Azure hosted instances (offered
on a PAGO basis) of our Virtuoso-hosted edition of Wikidata (which we
recently reloaded).
--
Regards,
Kingsley Idehen
Founder & CEO
OpenLink Software
Home Page: http://www.openlinksw.com
Community Support: https://community.openlinksw.com
Weblogs (Blogs):
Company Blog: https://medium.com/openlink-software-blog
Virtuoso Blog: https://medium.com/virtuoso-blog
Data Access Drivers Blog: https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers
Personal Weblogs (Blogs):
Medium Blog: https://medium.com/@kidehen
Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
http://kidehen.blogspot.com
Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen
Web Identities (WebID):
Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
: http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
_______________________________________________
Wikidata mailing list -- wikidata@lists.wikimedia.org
Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/6ND7MOVXL3F73SR37MBWEIT5CCOK2EES/
To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Guillaume Lederrey (he/him) Engineering Manager Wikimedia Foundation |