Hello everyone,
The Wikibase development team is excited to see the emergence of community-created import tooling such as RaiseWikibase and wikibase-insert, particularly because Wikibase does not yet come equipped with its own import mechanism “out of the box”. To help better support the community, we would like to offer some advice to toolmakers and provide some insight into our planned explorations into making API-based importing better and faster.
We anticipate an inherent issue with tools that directly inject information into the Wikibase database tables. Namely, that the schemas that such tools rely on are subject to change. Normal development processes across Wikimedia can (and likely will) endanger the long-term health and stability of these tools, an outcome we would like to avoid as much as possible. Specifically, we can’t guarantee that the layout or content of the tables that these tools write to will not change. Although Wikibase does not have its own public stable interface policy, we work with an eye on Wikidata's policy https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy#Unstable_Interfaces .
We would like to offer the following advice to tool developers:
1.
Wherever possible, use the HTTP APIs. 2.
When using them, make sure to batch requests when possible. 3.
Understand that by reading or writing directly to DB tables, Wikibase behavior may end up broken in subtle ways. 4.
Understand that tools which read or write from the DB on today’s Wikibase may not still work on tomorrow’s Wikibase.
The Wikibase development roadmap https://www.wikidata.org/wiki/Wikidata:Development_plan#Wikibase_ecosystem for the year is already tightly booked. However, Wikimedia Germany intends to dedicate some resources during our next 2021 prototyping effort towards exploring ways to optimize our API -- specifically, a solution approaching something like an "import mode" which would bypass unnecessary actions when inserting a body of previously vetted information. This might, for example, include ignoring checks on the uniqueness of labels or user permissions. In addition, we plan to dedicate time toward evaluating OpenRefine’s new Wikibase reconciliation functionality https://docs.openrefine.org/next/manual/wikibase/overview on behalf of the community. We will keep you updated on our efforts related to these topics.
If you are interested, we welcome you to watch the progress of these related Phabricator tickets as well:
-
https://phabricator.wikimedia.org/T287164 -- Improve bulk import via API -
https://phabricator.wikimedia.org/T285987 -- Do not generate full html parser output at the end of Wikibase edit requests
Cheers,