Hello again.
Thank you Dragan for your notes. I thought I would pipe up with a little bit of
commentary about what we have been up to at Matrix with the
enslaved.org project. It
over laps some of the cases you have in your notes.
The
enslaved.org project continues to move forward as we are working to load larger
datasets into our wikibase installation. The process of moving this from a testing
exercise to a production system has resulted in a small number of changes and alterations
that needed to be made to the wikibase software, and to our data loading process.
We realized early in the process, re-using the wikibase name space as the enslaved name
space would present problems both semantically, and technically. Semantically, while the
URIs were now pointing at the local wikibase deployment , the system still advertised
identifiers using the wikidata namespace prefixes. Confusion abounds when sample queries
and lookups that work for
wikidata.org just return nothing in the local environment.
Also, discussion of the searches and relationships was inherently muddled. Finally, we
have a long term goal to cross link data in the
enslaved.org project with data found on
wikidata.org. Sharing a namespace and prefix set is not practical or useful in that
case.
The problem isn’t limited to only wikibase, however. Not only does wikibase need to use
different prefixes, so does all the supporting tooling, including blazegraph,
quickstatements, and the custom front end presentation software being developed for
enslaved.org.
For
enslaved.org, we have a series of patches that are being locally applied to wikibase,
and the wikidata-query-service which are what we have found is necessary to support our
environment. These changes are either replacing a set of hard coded values in the
wikibase software, or extending the software to support something beyond the default of
wikidata.org. I consider these patches to be good guidance as to what elements need
configuration options to work beyond the
wikidata.org environment.
Additionally, we have been working on doing bulk data loading to wikibase, using
quickstatements, and multiple wikibase deployments as staging grounds. Due to the process
by which we add data to the
enslaved.org wikibase, open live editing of the system is not
generally supported. Instead, we process moderate to large datasets from our partners,
and work to make sure they will load successfully to our production environment in
controlled batches. To facilitate this, Matrix regularly clones the entire wikibase
deployment for
enslaved.org to an alternate server and URL and tests importing the data
sets on this alternate location before executing the same operation on the production
site. The cloning operation ensures that the identifiers in the system are identical in
testing as in production.
The cloning process, however, again runs afoul of all of the previous problems with
namespaces and URIs/URLs. Each time we clone the environment, updates to all of the
supporting infrastructure must also be made. Alternate blazegraph deployments exist for
this, but they must have the data set cleared and reloaded. The data set for blazegraph
can not use the production dumps, of course, because the production system urls and
staging system urls are different. Thus, present work is happening at Matrix to support
the automation of starting up a new blazegraph instance, learning a base wikibase URL, and
from that, loading the dumps, and entering live update mode for that wikibase.
The work being done at Matrix is very focused on the
enslaved.org project, but could be
used either as guidance on what needs to be more flexible, or possibly moved to
"configuration" to make the whole set of tools more federation friendly.
________________________________________
From: Wikibaseug <wikibaseug-bounces(a)lists.wikimedia.org> on behalf of Dragan
Espenschied <dragan.espenschied(a)rhizome.org>
Sent: Sunday, October 27, 2019 10:43 AM
To: wikibaseug(a)lists.wikimedia.org
Subject: [Wikibase] Wikibase new features
Dear Wikibase colleagues.
I have circulated these notes a little during WikidataCon already:
https://etherpad.wikimedia.org/p/Wikibase-Extensions-Federation<https://…
It is a list of extensions or new features that Rhizome needs to go into full production.
We're interested in helping produce these extensions and will have some funding
available in 2020. Please have a look into the Etherpad and leave a note if these features
would meet your needs as well, and if you have plans to implement something yourself.
My goal is to have extensions useful for most cases of running independent Wikibase
instances in the new "ecosystem" approach WMDE has announced, and discuss how
they can be made a reality.
Yours,
Dragan
--
Dragan Espenschied
Preservation Director
Rhizome<https://urldefense.proofpoint.com/v2/url?u=http-3A__rhizome.org&…
at the New Museum