Thank you Dragan for your notes. I thought I would pipe up with a little bit of
commentary about what we have been up to at Matrix with the enslaved.org
over laps some of the cases you have in your notes.
project continues to move forward as we are working to load larger
datasets into our wikibase installation. The process of moving this from a testing
exercise to a production system has resulted in a small number of changes and alterations
that needed to be made to the wikibase software, and to our data loading process.
We realized early in the process, re-using the wikibase name space as the enslaved name
space would present problems both semantically, and technically. Semantically, while the
URIs were now pointing at the local wikibase deployment , the system still advertised
identifiers using the wikidata namespace prefixes. Confusion abounds when sample queries
and lookups that work for wikidata.org
just return nothing in the local environment.
Also, discussion of the searches and relationships was inherently muddled. Finally, we
have a long term goal to cross link data in the enslaved.org
project with data found on
. Sharing a namespace and prefix set is not practical or useful in that
The problem isn’t limited to only wikibase, however. Not only does wikibase need to use
different prefixes, so does all the supporting tooling, including blazegraph,
quickstatements, and the custom front end presentation software being developed for
, we have a series of patches that are being locally applied to wikibase,
and the wikidata-query-service which are what we have found is necessary to support our
environment. These changes are either replacing a set of hard coded values in the
wikibase software, or extending the software to support something beyond the default of
. I consider these patches to be good guidance as to what elements need
configuration options to work beyond the wikidata.org
Additionally, we have been working on doing bulk data loading to wikibase, using
quickstatements, and multiple wikibase deployments as staging grounds. Due to the process
by which we add data to the enslaved.org
wikibase, open live editing of the system is not
generally supported. Instead, we process moderate to large datasets from our partners,
and work to make sure they will load successfully to our production environment in
controlled batches. To facilitate this, Matrix regularly clones the entire wikibase
deployment for enslaved.org
to an alternate server and URL and tests importing the data
sets on this alternate location before executing the same operation on the production
site. The cloning operation ensures that the identifiers in the system are identical in
testing as in production.
The cloning process, however, again runs afoul of all of the previous problems with
namespaces and URIs/URLs. Each time we clone the environment, updates to all of the
supporting infrastructure must also be made. Alternate blazegraph deployments exist for
this, but they must have the data set cleared and reloaded. The data set for blazegraph
can not use the production dumps, of course, because the production system urls and
staging system urls are different. Thus, present work is happening at Matrix to support
the automation of starting up a new blazegraph instance, learning a base wikibase URL, and
from that, loading the dumps, and entering live update mode for that wikibase.
The work being done at Matrix is very focused on the enslaved.org
project, but could be
used either as guidance on what needs to be more flexible, or possibly moved to
"configuration" to make the whole set of tools more federation friendly.
From: Wikibaseug <wikibaseug-bounces(a)lists.wikimedia.org> on behalf of Dragan
Sent: Sunday, October 27, 2019 10:43 AM
Subject: [Wikibase] Wikibase new features
Dear Wikibase colleagues.
I have circulated these notes a little during WikidataCon already:
It is a list of extensions or new features that Rhizome needs to go into full production.
We're interested in helping produce these extensions and will have some funding
available in 2020. Please have a look into the Etherpad and leave a note if these features
would meet your needs as well, and if you have plans to implement something yourself.
My goal is to have extensions useful for most cases of running independent Wikibase
instances in the new "ecosystem" approach WMDE has announced, and discuss how
they can be made a reality.
at the New Museum