Hello again.
Thank you Dragan for your notes. I thought I would pipe up with a little bit of commentary about what we have been up to at Matrix with the enslaved.org project. It over laps some of the cases you have in your notes.
The enslaved.org project continues to move forward as we are working to load larger datasets into our wikibase installation. The process of moving this from a testing exercise to a production system has resulted in a small number of changes and alterations that needed to be made to the wikibase software, and to our data loading process.
We realized early in the process, re-using the wikibase name space as the enslaved name space would present problems both semantically, and technically. Semantically, while the URIs were now pointing at the local wikibase deployment , the system still advertised identifiers using the wikidata namespace prefixes. Confusion abounds when sample queries and lookups that work for wikidata.org just return nothing in the local environment. Also, discussion of the searches and relationships was inherently muddled. Finally, we have a long term goal to cross link data in the enslaved.org project with data found on wikidata.org. Sharing a namespace and prefix set is not practical or useful in that case.
The problem isn’t limited to only wikibase, however. Not only does wikibase need to use different prefixes, so does all the supporting tooling, including blazegraph, quickstatements, and the custom front end presentation software being developed for enslaved.org.
For enslaved.org, we have a series of patches that are being locally applied to wikibase, and the wikidata-query-service which are what we have found is necessary to support our environment. These changes are either replacing a set of hard coded values in the wikibase software, or extending the software to support something beyond the default of wikidata.org. I consider these patches to be good guidance as to what elements need configuration options to work beyond the wikidata.org environment.
Additionally, we have been working on doing bulk data loading to wikibase, using quickstatements, and multiple wikibase deployments as staging grounds. Due to the process by which we add data to the enslaved.org wikibase, open live editing of the system is not generally supported. Instead, we process moderate to large datasets from our partners, and work to make sure they will load successfully to our production environment in controlled batches. To facilitate this, Matrix regularly clones the entire wikibase deployment for enslaved.org to an alternate server and URL and tests importing the data sets on this alternate location before executing the same operation on the production site. The cloning operation ensures that the identifiers in the system are identical in testing as in production.
The cloning process, however, again runs afoul of all of the previous problems with namespaces and URIs/URLs. Each time we clone the environment, updates to all of the supporting infrastructure must also be made. Alternate blazegraph deployments exist for this, but they must have the data set cleared and reloaded. The data set for blazegraph can not use the production dumps, of course, because the production system urls and staging system urls are different. Thus, present work is happening at Matrix to support the automation of starting up a new blazegraph instance, learning a base wikibase URL, and from that, loading the dumps, and entering live update mode for that wikibase.
The work being done at Matrix is very focused on the enslaved.org project, but could be used either as guidance on what needs to be more flexible, or possibly moved to "configuration" to make the whole set of tools more federation friendly.
________________________________________ From: Wikibaseug wikibaseug-bounces@lists.wikimedia.org on behalf of Dragan Espenschied dragan.espenschied@rhizome.org Sent: Sunday, October 27, 2019 10:43 AM To: wikibaseug@lists.wikimedia.org Subject: [Wikibase] Wikibase new features
Dear Wikibase colleagues.
I have circulated these notes a little during WikidataCon already: https://etherpad.wikimedia.org/p/Wikibase-Extensions-Federationhttps://urldefense.proofpoint.com/v2/url?u=https-3A__etherpad.wikimedia.org_p_Wikibase-2DExtensions-2DFederation&d=DwMBAg&c=nE__W8dFE-shTxStwXtp0A&r=B3he-DGqtX4zzyZ5rF5xeA&m=MTxIpZf_8uKgFA7dTled0pUSppv2-3NJe8re1BHqSc8&s=WmjHZEr0ns-FkiWNxJT5OWoEmOP5hed_dn2aksjCxBc&e=
It is a list of extensions or new features that Rhizome needs to go into full production. We're interested in helping produce these extensions and will have some funding available in 2020. Please have a look into the Etherpad and leave a note if these features would meet your needs as well, and if you have plans to implement something yourself.
My goal is to have extensions useful for most cases of running independent Wikibase instances in the new "ecosystem" approach WMDE has announced, and discuss how they can be made a reality.
Yours, Dragan -- Dragan Espenschied Preservation Director Rhizomehttps://urldefense.proofpoint.com/v2/url?u=http-3A__rhizome.org&d=DwMBAg&c=nE__W8dFE-shTxStwXtp0A&r=B3he-DGqtX4zzyZ5rF5xeA&m=MTxIpZf_8uKgFA7dTled0pUSppv2-3NJe8re1BHqSc8&s=hmqR9CxAt2bm_ytYa5HG8xx3oMwPeHRm_-OKpHWtogM&e= at the New Museum