Hello everyone,
The Wikibase development team is excited to see the emergence of community-created import tooling such as RaiseWikibase and wikibase-insert, particularly because Wikibase does not yet come equipped with its own import mechanism “out of the box”. To help better support the community, we would like to offer some advice to toolmakers and provide some insight into our planned explorations into making API-based importing better and faster.
We anticipate an inherent issue with tools that directly inject information into the Wikibase database tables. Namely, that the schemas that such tools rely on are subject to change. Normal development processes across Wikimedia can (and likely will) endanger the long-term health and stability of these tools, an outcome we would like to avoid as much as possible. Specifically, we can’t guarantee that the layout or content of the tables that these tools write to will not change. Although Wikibase does not have its own public stable interface policy, we work with an eye on Wikidata's policy https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy#Unstable_Interfaces .
We would like to offer the following advice to tool developers:
1.
Wherever possible, use the HTTP APIs. 2.
When using them, make sure to batch requests when possible. 3.
Understand that by reading or writing directly to DB tables, Wikibase behavior may end up broken in subtle ways. 4.
Understand that tools which read or write from the DB on today’s Wikibase may not still work on tomorrow’s Wikibase.
The Wikibase development roadmap https://www.wikidata.org/wiki/Wikidata:Development_plan#Wikibase_ecosystem for the year is already tightly booked. However, Wikimedia Germany intends to dedicate some resources during our next 2021 prototyping effort towards exploring ways to optimize our API -- specifically, a solution approaching something like an "import mode" which would bypass unnecessary actions when inserting a body of previously vetted information. This might, for example, include ignoring checks on the uniqueness of labels or user permissions. In addition, we plan to dedicate time toward evaluating OpenRefine’s new Wikibase reconciliation functionality https://docs.openrefine.org/next/manual/wikibase/overview on behalf of the community. We will keep you updated on our efforts related to these topics.
If you are interested, we welcome you to watch the progress of these related Phabricator tickets as well:
-
https://phabricator.wikimedia.org/T287164 -- Improve bulk import via API -
https://phabricator.wikimedia.org/T285987 -- Do not generate full html parser output at the end of Wikibase edit requests
Cheers,
Hi Mohammed,
Understood regarding changes to the relational schema, and great to hear that optimising the APIs is in the works!
I'm wondering if there is some documentation relating to the relational schema used, and would it be possible to report changes to the schema across the different versions?
Would such changes to the relational schema also affect those who use the standard APIs for data upload and later wish to upgrade their version of Wikibase over the existing data?
Best, Aidan
On 2021-08-17 8:44, Mohammed Sadat Abdulai wrote:
Hello everyone,
The Wikibase development team is excited to see the emergence of community-created import tooling such as RaiseWikibase and wikibase-insert, particularly because Wikibase does not yet come equipped with its own import mechanism “out of the box”. To help better support the community, we would like to offer some advice to toolmakers and provide some insight into our planned explorations into making API-based importing better and faster.
We anticipate an inherent issue with tools that directly inject information into the Wikibase database tables. Namely, that the schemas that such tools rely on are subject to change. Normal development processes across Wikimedia can (and likely will) endanger the long-term health and stability of these tools, an outcome we would like to avoid as much as possible. Specifically, we can’t guarantee that the layout or content of the tables that these tools write to will not change. Although Wikibase does not have its own public stable interface policy, we work with an eye on Wikidata's policy https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy#Unstable_Interfaces.
We would like to offer the following advice to tool developers:
Wherever possible, use the HTTP APIs.
When using them, make sure to batch requests when possible.
Understand that by reading or writing directly to DB tables, Wikibase behavior may end up broken in subtle ways.
Understand that tools which read or write from the DB on today’s Wikibase may not still work on tomorrow’s Wikibase.
The Wikibase development roadmap https://www.wikidata.org/wiki/Wikidata:Development_plan#Wikibase_ecosystemfor the year is already tightly booked. However, Wikimedia Germany intends to dedicate some resources during our next 2021 prototyping effort towards exploring ways to optimize our API -- specifically, a solution approaching something like an "import mode" which would bypass unnecessary actions when inserting a body of previously vetted information. This might, for example, include ignoring checks on the uniqueness of labels or user permissions. In addition, we plan to dedicate time toward evaluating OpenRefine’s new Wikibase reconciliation functionality https://docs.openrefine.org/next/manual/wikibase/overviewon behalf of the community. We will keep you updated on our efforts related to these topics.
If you are interested, we welcome you to watch the progress of these related Phabricator tickets as well:
https://phabricator.wikimedia.org/T287164 https://phabricator.wikimedia.org/T287164 -- Improve bulk import via API
https://phabricator.wikimedia.org/T285987 https://phabricator.wikimedia.org/T285987-- Do not generate full html parser output at the end of Wikibase edit requests
Cheers,
-- Mohammed Sadat *Community Communications Manager for Wikidata/Wikibase*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de https://wikimedia.de
Keep up to date! Current news and exciting stories about Wikimedia, Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now https://www.wikimedia.de/newsletter/.
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de https://spenden.wikimedia.de
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikibaseug mailing list -- wikibaseug@lists.wikimedia.org To unsubscribe send an email to wikibaseug-leave@lists.wikimedia.org
Hello Aidan,
Thanks for your feedback!
Changes to the relational schema by design won’t impact those who use the standard APIs for data upload. Assuming they go on to install a newer version of Wikibase, providing that they follow the documentation for upgrading, any schema changes will be automatically applied.
In terms of documentation of the relational schema, there is no single place that encompasses it. You can certainly read some details in the Wikibase Developer documentation[1]. You can also read about the MediaWiki database and tables [2].
Of course, the documentation may at some point have drifted slightly with respect to the code which remains the source of truth. You can see in a more declarative form the schemas that will be used in the upcoming release of Wikibase for both Wikibase[3] and Mediawiki[4].
Cheers,
The Wikidata Team
[1] - https://doc.wikimedia.org/Wikibase/master/php/
[2] - https://www.mediawiki.org/wiki/Manual:Database_layout
[3] - https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibase...
[4] - https://gerrit.wikimedia.org/g/mediawiki/core/%2B/HEAD/maintenance/tables.js...
On Fri, Aug 20, 2021 at 9:09 AM Aidan Hogan aidhog@gmail.com wrote:
Hi Mohammed,
Understood regarding changes to the relational schema, and great to hear that optimising the APIs is in the works!
I'm wondering if there is some documentation relating to the relational schema used, and would it be possible to report changes to the schema across the different versions?
Would such changes to the relational schema also affect those who use the standard APIs for data upload and later wish to upgrade their version of Wikibase over the existing data?
Best, Aidan
On 2021-08-17 8:44, Mohammed Sadat Abdulai wrote:
Hello everyone,
The Wikibase development team is excited to see the emergence of community-created import tooling such as RaiseWikibase and wikibase-insert, particularly because Wikibase does not yet come equipped with its own import mechanism “out of the box”. To help better support the community, we would like to offer some advice to toolmakers and provide some insight into our planned explorations into making API-based importing better and faster.
We anticipate an inherent issue with tools that directly inject information into the Wikibase database tables. Namely, that the schemas that such tools rely on are subject to change. Normal development processes across Wikimedia can (and likely will) endanger the long-term health and stability of these tools, an outcome we would like to avoid as much as possible. Specifically, we can’t guarantee that the layout or content of the tables that these tools write to will not change. Although Wikibase does not have its own public stable interface policy, we work with an eye on Wikidata's policy <
https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy#Unstable_Inte...
.
We would like to offer the following advice to tool developers:
Wherever possible, use the HTTP APIs.
When using them, make sure to batch requests when possible.
Understand that by reading or writing directly to DB tables, Wikibase behavior may end up broken in subtle ways.
Understand that tools which read or write from the DB on today’s Wikibase may not still work on tomorrow’s Wikibase.
The Wikibase development roadmap <
https://www.wikidata.org/wiki/Wikidata:Development_plan#Wikibase_ecosystem%3...
the year is already tightly booked. However, Wikimedia Germany intends to dedicate some resources during our next 2021 prototyping effort towards exploring ways to optimize our API -- specifically, a solution approaching something like an "import mode" which would bypass unnecessary actions when inserting a body of previously vetted information. This might, for example, include ignoring checks on the uniqueness of labels or user permissions. In addition, we plan to dedicate time toward evaluating OpenRefine’s new Wikibase reconciliation functionality https://docs.openrefine.org/next/manual/wikibase/overviewon behalf of the community. We will keep you updated on our efforts related to these topics.
If you are interested, we welcome you to watch the progress of these related Phabricator tickets as well:
https://phabricator.wikimedia.org/T287164 https://phabricator.wikimedia.org/T287164 -- Improve bulk import via API
https://phabricator.wikimedia.org/T285987 https://phabricator.wikimedia.org/T285987-- Do not generate full html parser output at the end of Wikibase edit requests
Cheers,
-- Mohammed Sadat *Community Communications Manager for Wikidata/Wikibase*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de https://wikimedia.de
Keep up to date! Current news and exciting stories about Wikimedia, Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now https://www.wikimedia.de/newsletter/.
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de https://spenden.wikimedia.de
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikibaseug mailing list -- wikibaseug@lists.wikimedia.org To unsubscribe send an email to wikibaseug-leave@lists.wikimedia.org
Wikibaseug mailing list -- wikibaseug@lists.wikimedia.org To unsubscribe send an email to wikibaseug-leave@lists.wikimedia.org
Thanks very much for the announcement. When you say "make sure to batch requests", can you give some detail on exactly what this means? Is this something supported by Wikidata-Toolkit Java Library ( https://github.com/Wikidata/Wikidata-Toolkit )?
Hello Joe,
By batching requests, we mean minimizing the number of requests you need to make.
For example, if you're reading data with the wbgetentities endpoint, try to request the maximum number of IDs you are able to at a time.
If you’re adding data, avoid making repeated wbsetclaim requests; instead put all the claims you want to add in a single wbeditentity request.
I believe that this is supported by the Wikidata-Toolkit, although I'm no expert in it. In the examples, you can see how to batch together reading a list of properties rather than reading them one by one. ( https://github.com/Wikidata/Wikidata-Toolkit-Examples/blob/master/src/exampl... )
Cheers,
The Wikidata Team
On Fri, Aug 20, 2021 at 6:47 PM Joe Wass jwass@crossref.org wrote:
Thanks very much for the announcement. When you say "make sure to batch requests", can you give some detail on exactly what this means? Is this something supported by Wikidata-Toolkit Java Library ( https://github.com/Wikidata/Wikidata-Toolkit )? _______________________________________________ Wikibaseug mailing list -- wikibaseug@lists.wikimedia.org To unsubscribe send an email to wikibaseug-leave@lists.wikimedia.org
wikibaseug@lists.wikimedia.org