Has anyone imported a subset of Wikidata into another Wikibase? How did you do it?
I am working a project which wants to import Wikidata properties into their Wikibase. I looked into how to do this, but I am hitting roadblocks.
1) Fetching data This part is easy, can do it via the API or dumps. If I only take labels and descriptions and datatypes, then I don't even need to worry about fetching any items referenced in the properties' properties.
2) Importing data It's possible to create properties via api.php?action=wbeditentity, but it does not allow specifying the id for new entities. We would like to keep the ids same as in Wikidata for simplicity. I thought that I would just create all properties up to P6000 or so, edit them via the api, and then delete the unused properties. However, this won't work because the property datatype must be specified at creation time and cannot be changed afterwards.
Possible solutions: a) Make it possible to specify the id of newly created entity (likely guarded with a special right). b) Make it possible to change property datatype after creation. c) Drop the requirement of using the same ids in our wikibase. d) Use a MediaWiki maintenance script where it is possible to bypass restrictions and specify the id of the newly created entity. Afterwards database must be updated manually to increase the tracker for next free entity number to match the highest used id. e) Avoid having to import properties (e.g. wait for support for federated wikibases, implement an another storage mechanism that only refers to Wikidata ids without having them in place.)
I am currently dreaming of (a) and probably going for (d) for which I already have a rudimentary script. Any comments, suggestions or tips?
-Niklas
Hi Niklas!
You could go for a straight up import of the XML dump. This is disabled per default, but can be enabeld using $wgWBRepoSettings['allowEntityImport']. Populating an empty wikibase instance this way should work fine, but things get probelmatic if you then create your own items and properties, and later want to import again. Then you hit ID connflicts.
An alternative is Aude's import script, which applies a mapping between your IDs and Wikidata's IDs: https://github.com/filbertkm/WikibaseImport.
Finally, the "proper" solution would be Federation, but that's not ready yet...
-- daniel
Am 10.08.2018 um 11:12 schrieb Niklas Laxström:
Has anyone imported a subset of Wikidata into another Wikibase? How did you do it?
I am working a project which wants to import Wikidata properties into their Wikibase. I looked into how to do this, but I am hitting roadblocks.
- Fetching data
This part is easy, can do it via the API or dumps. If I only take labels and descriptions and datatypes, then I don't even need to worry about fetching any items referenced in the properties' properties.
- Importing data
It's possible to create properties via api.php?action=wbeditentity, but it does not allow specifying the id for new entities. We would like to keep the ids same as in Wikidata for simplicity. I thought that I would just create all properties up to P6000 or so, edit them via the api, and then delete the unused properties. However, this won't work because the property datatype must be specified at creation time and cannot be changed afterwards.
Possible solutions: a) Make it possible to specify the id of newly created entity (likely guarded with a special right). b) Make it possible to change property datatype after creation. c) Drop the requirement of using the same ids in our wikibase. d) Use a MediaWiki maintenance script where it is possible to bypass restrictions and specify the id of the newly created entity. Afterwards database must be updated manually to increase the tracker for next free entity number to match the highest used id. e) Avoid having to import properties (e.g. wait for support for federated wikibases, implement an another storage mechanism that only refers to Wikidata ids without having them in place.)
I am currently dreaming of (a) and probably going for (d) for which I already have a rudimentary script. Any comments, suggestions or tips?
-Niklas
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Hi Niklas,
On 8/10/18 11:23, Daniel Kinzler wrote:
You could go for a straight up import of the XML dump.
If you are using MediaWiki Vagrant [1] for your Wikibase instance, I've contributed a couple of walkthroughs [2,3]. In my experience, however, the XML import ended in lots of missing data, most importantly properties and labels.
I've not tried Aude's script yet, but my suggestion is to go for it first.
Cheers,
Marco
[1] https://www.mediawiki.org/wiki/MediaWiki-Vagrant [2] https://www.mediawiki.org/wiki/MediaWiki-Vagrant#wikidata [3] https://www.mediawiki.org/wiki/MediaWiki-Vagrant#How_to_import_a_Wikidata_du...
On Fri, Aug 10, 2018, 11:12 Niklas Laxström niklas.laxstrom@gmail.com wrote:
Has anyone imported a subset of Wikidata into another Wikibase? How did you do it?
I am working a project which wants to import Wikidata properties into their Wikibase. I looked into how to do this, but I am hitting roadblocks.
- Fetching data
This part is easy, can do it via the API or dumps. If I only take labels and descriptions and datatypes, then I don't even need to worry about fetching any items referenced in the properties' properties.
- Importing data
It's possible to create properties via api.php?action=wbeditentity, but it does not allow specifying the id for new entities. We would like to keep the ids same as in Wikidata for simplicity. I thought that I would just create all properties up to P6000 or so, edit them via the api, and then delete the unused properties. However, this won't work because the property datatype must be specified at creation time and cannot be changed afterwards.
In general I don't think you should strive for keeping the same IDs. They are going to diverge anyway as Wikidata and your wiki evolve. A mapping with statements is likely more useful.
Cheers Lydia
Possible solutions:
a) Make it possible to specify the id of newly created entity (likely guarded with a special right). b) Make it possible to change property datatype after creation. c) Drop the requirement of using the same ids in our wikibase. d) Use a MediaWiki maintenance script where it is possible to bypass restrictions and specify the id of the newly created entity. Afterwards database must be updated manually to increase the tracker for next free entity number to match the highest used id. e) Avoid having to import properties (e.g. wait for support for federated wikibases, implement an another storage mechanism that only refers to Wikidata ids without having them in place.)
I am currently dreaming of (a) and probably going for (d) for which I already have a rudimentary script. Any comments, suggestions or tips?
-Niklas
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
wikidata-tech@lists.wikimedia.org