Hello -
Sorry for the crosspost/repeat - I sent a version of this to the wikidata mailing list, but it was right in the peak of the holidays. This list is probably more appropriate for it and hopefully by now the wikibase developers are back from their holidays and all caught up on email/year end/new year tasks, and can help provide some guidance.
The tl;dr version of this post: On a blank Wikibase instance, I want to be able to do:
api.php?action=wbeditentity&*new=item&id=Q42*&data={"labels":{"en":{"language":"en","value":"Douglas Adams"}}}
I do not want to do this on wikidata.org - I understand why it makes no sense in that context. But I would like to be able to do this on my own Wikibase instance.
Beyond the whimsical like ensuring Doug Adams gets to be Q42, the main reason for this is data portability and identifier stability. As more hosted Wikibase providers come online and start offering services, I want to know that I have data portability if I need to change to a different provider. Anyone who queries my Wikibase needs to know the identifiers my Wikibase uses for instances and more importantly for classes, and if I change providers, those identifiers cannot change without breaking those queries.
I do not think that MySQL backups are a reliable way to be able to transition between providers. I am not confident that all providers will want to offer a service where they accept a MySQL backup to load into their Wikibase backend, and there are additional challenges moving between Wikibase versions. (Though some may - I programmatically create the contents of my Wikibase so I don't care about edit history, but if one were to care about that history and other things like wikiusers I imagine the MySQL dumps would be the preferred way to migrate?)
One possible solution is to simply create blank items in a new Wikibase, from 1 to the maximum identifier used in my old wikibase, and then repopulate each item with the claims from my old Wikibase instance. Unfortunately this is not a reliable solution because while Wikibase guarantees that item IDs will not be reused, it does not guarantee that every ID in the sequence will be created, e.g. in rare cases Wikibase may go from Q41 to Q43 and skip/never create Q42.
I don't mind that the identifier needs to be prefixed with a 'Q' or a 'P' for a particular type, I just want to be able to set the same identifier if I set up a new wikibase instance.
I think Wikibase is awesome, but it is an odd database that does not allow you to set the keys for the data you are managing :)
In reading through the Wikibase Repo code, it seems like this scenario was considered though perhaps isn't fully implemented (or has been disabled?). The code in EntitySavingHelper.php looks like there are/were ways to call it by providing an ID while still asking for a new entity, though there is logic earlier in the ModifyEntity code to look for and explicitly reject the case where the API asks for 'new' and also provides an ID, so I'm not sure how this code path would get called. There is also code to ask the entityStores if they 'canCreateWithCustomId', but those all appear to just return 'false'?
However, if that logic was skipped in the API handler and a bit of code reworked in ModifyEntity and EntitySavingHelper, along with ensuring that that the next available ID is kept up to date in the wb_id_counters table to always be 1 beyond the maximum ID in use, it looks like it might not be that hard to enable creating entities with specific IDs?
So three questions: Would the Wikibase development team ever be open to supporting something like this, behind a flag like $wgWBRepoSettings['allowUserProvidedIds'] that defaulted to false?
Are there more complicated implications from allowing a change like this that would need to be considered? I understand why the Wikidata.org repo needs this codepath fast and can't allow users to provide IDs for new entities anyway, but are there other reasons this isn't supported beyond "Wikidata doesn't need it?"
Is this all moot with the eventual REST API? I see that there's a PUT envisioned, could I use that to directly create an item or property and give it an ID then, or does the ID have to already exist to replace it?
I am happy to try to tackle creating a patch for this, but I'd like to get some feedback if there's any big lurking issues that I should know about before starting on the work - I'd rather not get deep into it only to find out it will never work or never be accepted. I'm also happy to shift this to phabricator if that's more appropriate.
Thank you all for your work on Wikibase!
Thanks,
-Erik
Hi Eric,
Whenever I create a new Wikibase, the first property (P1) I create is "Wikidata mapping" with a value data type URL. And for each Item that reflects a concept that exists in Wikidata, I add the URL for that wikidata item to the related Wikibase item. Doing so the items are now connected.
With quite some caveats it is possible to replicate the Q numbers from WIkidata in Wikibase, however, using a mapping property is IMHO a preferred and more stable solution. One of those caveats is that it is not 100% full proof. For example, if you make a copy of Wikidata (or a subset) on day one, both your Wikibase and Wikidata will have separate lives. If that is not the case why bother setting up a Wikibase not simply rely on wikidata. So if at some time later want to again sync with wikidata, you have suddenly Qids on your Wikibase (ie. items you created) that have the exact same Qid of a totally different Wikidata item. Meaning that in the long run, you will not be able to sustain the Wikidata Qid.
When I want to query my personal wikibase for a Wikidata item that I replicated in my Wikibase, I use the following query:
"SELECT * WHERE {?item wbt:P1 wd:Q42 ; wbt:Pxx ?some_extra_annotation_not_in_wikidata } ."
Andra
On Fri, Jan 29, 2021 at 4:10 AM Erik Paulson epaulson@unit1127.com wrote:
Hello -
Sorry for the crosspost/repeat - I sent a version of this to the wikidata mailing list, but it was right in the peak of the holidays. This list is probably more appropriate for it and hopefully by now the wikibase developers are back from their holidays and all caught up on email/year end/new year tasks, and can help provide some guidance.
The tl;dr version of this post: On a blank Wikibase instance, I want to be able to do:
api.php?action=wbeditentity&*new=item&id=Q42*&data={"labels":{"en":{"language":"en","value":"Douglas Adams"}}}
I do not want to do this on wikidata.org - I understand why it makes no sense in that context. But I would like to be able to do this on my own Wikibase instance.
Beyond the whimsical like ensuring Doug Adams gets to be Q42, the main reason for this is data portability and identifier stability. As more hosted Wikibase providers come online and start offering services, I want to know that I have data portability if I need to change to a different provider. Anyone who queries my Wikibase needs to know the identifiers my Wikibase uses for instances and more importantly for classes, and if I change providers, those identifiers cannot change without breaking those queries.
I do not think that MySQL backups are a reliable way to be able to transition between providers. I am not confident that all providers will want to offer a service where they accept a MySQL backup to load into their Wikibase backend, and there are additional challenges moving between Wikibase versions. (Though some may - I programmatically create the contents of my Wikibase so I don't care about edit history, but if one were to care about that history and other things like wikiusers I imagine the MySQL dumps would be the preferred way to migrate?)
One possible solution is to simply create blank items in a new Wikibase, from 1 to the maximum identifier used in my old wikibase, and then repopulate each item with the claims from my old Wikibase instance. Unfortunately this is not a reliable solution because while Wikibase guarantees that item IDs will not be reused, it does not guarantee that every ID in the sequence will be created, e.g. in rare cases Wikibase may go from Q41 to Q43 and skip/never create Q42.
I don't mind that the identifier needs to be prefixed with a 'Q' or a 'P' for a particular type, I just want to be able to set the same identifier if I set up a new wikibase instance.
I think Wikibase is awesome, but it is an odd database that does not allow you to set the keys for the data you are managing :)
In reading through the Wikibase Repo code, it seems like this scenario was considered though perhaps isn't fully implemented (or has been disabled?). The code in EntitySavingHelper.php looks like there are/were ways to call it by providing an ID while still asking for a new entity, though there is logic earlier in the ModifyEntity code to look for and explicitly reject the case where the API asks for 'new' and also provides an ID, so I'm not sure how this code path would get called. There is also code to ask the entityStores if they 'canCreateWithCustomId', but those all appear to just return 'false'?
However, if that logic was skipped in the API handler and a bit of code reworked in ModifyEntity and EntitySavingHelper, along with ensuring that that the next available ID is kept up to date in the wb_id_counters table to always be 1 beyond the maximum ID in use, it looks like it might not be that hard to enable creating entities with specific IDs?
So three questions: Would the Wikibase development team ever be open to supporting something like this, behind a flag like $wgWBRepoSettings['allowUserProvidedIds'] that defaulted to false?
Are there more complicated implications from allowing a change like this that would need to be considered? I understand why the Wikidata.org repo needs this codepath fast and can't allow users to provide IDs for new entities anyway, but are there other reasons this isn't supported beyond "Wikidata doesn't need it?"
Is this all moot with the eventual REST API? I see that there's a PUT envisioned, could I use that to directly create an item or property and give it an ID then, or does the ID have to already exist to replace it?
I am happy to try to tackle creating a patch for this, but I'd like to get some feedback if there's any big lurking issues that I should know about before starting on the work - I'd rather not get deep into it only to find out it will never work or never be accepted. I'm also happy to shift this to phabricator if that's more appropriate.
Thank you all for your work on Wikibase!
Thanks,
-Erik _______________________________________________ Wikibaseug mailing list Wikibaseug@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibaseug
Hi both,
I agree with Andra that trying to match exact IDs doesn't make sense.
But also Erik points out a real issue with migration: at the moment, the only possibility to back up a Wikibase instance is a full MySQL dump/restore, which for better or worse contains much more that just Wikibase data, but also information on user accounts, installed extensions, etc.
Overall, current Wikibase hosting services should offer the the option of SQL dumps to be exported and ideally imported, otherwise I wouldn't advise to commit any important data into one.
Erik, while your idea for extending the API would present a quick fix, I believe in the long run Wikibase must learn to import RDF. The export already works. As some suggestions for the Wikibase REST API looked like just working on the RDBM level instead of linked data, I'm kind of afraid of any idea that would be "easy to implement" on top of MySQL, as it will entrench the idea that linked data items are just special Wikipages and RDF is just a second thought, which is causing a lot of issues. One of them is that you cannot define URLs for your properties and items, exactly the problem you want to solve.
Yours, Dragan
On Fr, Jan 29, 2021 at 10:23, Andra Waagmeester andra@micel.io wrote:
Hi Eric,
Whenever I create a new Wikibase, the first property (P1) I
create is "Wikidata mapping" with a value data type URL. And for each Item that reflects a concept that exists in Wikidata, I add the URL for that wikidata item to the related Wikibase item. Doing so the items are now connected.
With quite some caveats it is possible to replicate the Q numbers from WIkidata in Wikibase, however, using a mapping property is IMHO a preferred and more stable solution. One of those caveats is that it is not 100% full proof. For example, if you make a copy of Wikidata (or a subset) on day one, both your Wikibase and Wikidata will have separate lives. If that is not the case why bother setting up a Wikibase not simply rely on wikidata. So if at some time later want to again sync with wikidata, you have suddenly Qids on your Wikibase (ie. items you created) that have the exact same Qid of a totally different Wikidata item. Meaning that in the long run, you will not be able to sustain the Wikidata Qid.
When I want to query my personal wikibase for a Wikidata item that I replicated in my Wikibase, I use the following query:
"SELECT * WHERE {?item wbt:P1 wd:Q42 ; wbt:Pxx ?some_extra_annotation_not_in_wikidata } ."
Andra
On Fri, Jan 29, 2021 at 4:10 AM Erik Paulson epaulson@unit1127.com wrote:
Hello -
Sorry for the crosspost/repeat - I sent a version of this to the wikidata mailing list, but it was right in the peak of the holidays. This list is probably more appropriate for it and hopefully by now the wikibase developers are back from their holidays and all caught up on email/year end/new year tasks, and can help provide some guidance.
The tl;dr version of this post: On a blank Wikibase instance, I want to be able to do:
api.php?action=wbeditentity&new=item&id=Q42&data={"labels":{"en":{"language":"en","value":"Douglas Adams"}}}
I do not want to do this on wikidata.org - I understand why it makes no sense in that context. But I would like to be able to do this on my own Wikibase instance.
Beyond the whimsical like ensuring Doug Adams gets to be Q42, the main reason for this is data portability and identifier stability. As more hosted Wikibase providers come online and start offering services, I want to know that I have data portability if I need to change to a different provider. Anyone who queries my Wikibase needs to know the identifiers my Wikibase uses for instances and more importantly for classes, and if I change providers, those identifiers cannot change without breaking those queries.
I do not think that MySQL backups are a reliable way to be able to transition between providers. I am not confident that all providers will want to offer a service where they accept a MySQL backup to load into their Wikibase backend, and there are additional challenges moving between Wikibase versions. (Though some may - I programmatically create the contents of my Wikibase so I don't care about edit history, but if one were to care about that history and other things like wikiusers I imagine the MySQL dumps would be the preferred way to migrate?)
One possible solution is to simply create blank items in a new Wikibase, from 1 to the maximum identifier used in my old wikibase, and then repopulate each item with the claims from my old Wikibase instance. Unfortunately this is not a reliable solution because while Wikibase guarantees that item IDs will not be reused, it does not guarantee that every ID in the sequence will be created, e.g. in rare cases Wikibase may go from Q41 to Q43 and skip/never create Q42.
I don't mind that the identifier needs to be prefixed with a 'Q' or a 'P' for a particular type, I just want to be able to set the same identifier if I set up a new wikibase instance.
I think Wikibase is awesome, but it is an odd database that does not allow you to set the keys for the data you are managing :)
In reading through the Wikibase Repo code, it seems like this scenario was considered though perhaps isn't fully implemented (or has been disabled?). The code in EntitySavingHelper.php looks like there are/were ways to call it by providing an ID while still asking for a new entity, though there is logic earlier in the ModifyEntity code to look for and explicitly reject the case where the API asks for 'new' and also provides an ID, so I'm not sure how this code path would get called. There is also code to ask the entityStores if they 'canCreateWithCustomId', but those all appear to just return 'false'?
However, if that logic was skipped in the API handler and a bit of code reworked in ModifyEntity and EntitySavingHelper, along with ensuring that that the next available ID is kept up to date in the wb_id_counters table to always be 1 beyond the maximum ID in use, it looks like it might not be that hard to enable creating entities with specific IDs?
So three questions: Would the Wikibase development team ever be open to supporting something like this, behind a flag like $wgWBRepoSettings['allowUserProvidedIds'] that defaulted to false?
Are there more complicated implications from allowing a change like this that would need to be considered? I understand why the Wikidata.org repo needs this codepath fast and can't allow users to provide IDs for new entities anyway, but are there other reasons this isn't supported beyond "Wikidata doesn't need it?"
Is this all moot with the eventual REST API? I see that there's a PUT envisioned, could I use that to directly create an item or property and give it an ID then, or does the ID have to already exist to replace it?
I am happy to try to tackle creating a patch for this, but I'd like to get some feedback if there's any big lurking issues that I should know about before starting on the work - I'd rather not get deep into it only to find out it will never work or never be accepted. I'm also happy to shift this to phabricator if that's more appropriate.
Thank you all for your work on Wikibase!
Thanks,
-Erik _______________________________________________ Wikibaseug mailing list Wikibaseug@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibaseug
Andra/Dragan -
I'm sorry if my original message wasn't clear, but you have misunderstood one part of it: I am not trying to match IDs from Wikidata into my own Wikibase. For purposes of my question, Wikidata doesn't even need to exist. (Though I love wikidata)
Wikibase is a database. Unfortunately it does not support inserting new items and properties with a primary key. Wikibase creates those IDs and tells you what it assigns as the primary key.
I would like to be able to create new items in a Wikibase while specifying what the primary key should be. I understand that Wikibase requires that item keys start with 'Q' and property keys start with 'P', I can live with that. But I want to be able to set the ID that follows the P and Q when I create an entity in my own Wikibase.
Wikidata does not need to be able to support specifying an ID when an entity is created, which I assume is why Wikibase doesn't support it. I am trying to understand how hard would it be to add optional support into Wikibase to provide the ID at entity creation time, even though it will never be turned on for Wikidata.
Longer-term I think it would be awesome if I could provide my own IRI to use in place of a P<ID> or Q<ID>, but I understand that that is a bigger conversation.
-Erik
On Fri, Jan 29, 2021 at 4:30 AM Dragan Espenschied < dragan.espenschied@rhizome.org> wrote:
Hi both,
I agree with Andra that trying to match exact IDs doesn't make sense.
But also Erik points out a real issue with migration: at the moment, the only possibility to back up a Wikibase instance is a full MySQL dump/restore, which for better or worse contains much more that just Wikibase data, but also information on user accounts, installed extensions, etc.
Overall, current Wikibase hosting services should offer the the option of SQL dumps to be exported and ideally imported, otherwise I wouldn't advise to commit any important data into one.
Erik, while your idea for extending the API would present a quick fix, I believe in the long run Wikibase must learn to import RDF. The export already works. As some suggestions for the Wikibase REST API looked like just working on the RDBM level instead of linked data, I'm kind of afraid of any idea that would be "easy to implement" on top of MySQL, as it will entrench the idea that linked data items are just special Wikipages and RDF is just a second thought, which is causing a lot of issues. One of them is that you cannot define URLs for your properties and items, exactly the problem you want to solve.
Yours, Dragan
On Fr, Jan 29, 2021 at 10:23, Andra Waagmeester andra@micel.io wrote:
Hi Eric,
Whenever I create a new Wikibase, the first property (P1) I
create is "Wikidata mapping" with a value data type URL. And for each Item that reflects a concept that exists in Wikidata, I add the URL for that wikidata item to the related Wikibase item. Doing so the items are now connected.
With quite some caveats it is possible to replicate the Q numbers from WIkidata in Wikibase, however, using a mapping property is IMHO a preferred and more stable solution. One of those caveats is that it is not 100% full proof. For example, if you make a copy of Wikidata (or a subset) on day one, both your Wikibase and Wikidata will have separate lives. If that is not the case why bother setting up a Wikibase not simply rely on wikidata. So if at some time later want to again sync with wikidata, you have suddenly Qids on your Wikibase (ie. items you created) that have the exact same Qid of a totally different Wikidata item. Meaning that in the long run, you will not be able to sustain the Wikidata Qid.
When I want to query my personal wikibase for a Wikidata item that I replicated in my Wikibase, I use the following query:
"SELECT * WHERE {?item wbt:P1 wd:Q42 ; wbt:Pxx ?some_extra_annotation_not_in_wikidata } ."
Andra
On Fri, Jan 29, 2021 at 4:10 AM Erik Paulson epaulson@unit1127.com wrote:
Hello -
Sorry for the crosspost/repeat - I sent a version of this to the wikidata mailing list, but it was right in the peak of the holidays. This list is probably more appropriate for it and hopefully by now the wikibase developers are back from their holidays and all caught up on email/year end/new year tasks, and can help provide some guidance.
The tl;dr version of this post: On a blank Wikibase instance, I want to be able to do:
api.php?action=wbeditentity&new=item&id=Q42&data={"labels":{"en":{"language":"en","value":"Douglas
Adams"}}}
I do not want to do this on wikidata.org - I understand why it makes no sense in that context. But I would like to be able to do this on my own Wikibase instance.
Beyond the whimsical like ensuring Doug Adams gets to be Q42, the main reason for this is data portability and identifier stability. As more hosted Wikibase providers come online and start offering services, I want to know that I have data portability if I need to change to a different provider. Anyone who queries my Wikibase needs to know the identifiers my Wikibase uses for instances and more importantly for classes, and if I change providers, those identifiers cannot change without breaking those queries.
I do not think that MySQL backups are a reliable way to be able to transition between providers. I am not confident that all providers will want to offer a service where they accept a MySQL backup to load into their Wikibase backend, and there are additional challenges moving between Wikibase versions. (Though some may - I programmatically create the contents of my Wikibase so I don't care about edit history, but if one were to care about that history and other things like wikiusers I imagine the MySQL dumps would be the preferred way to migrate?)
One possible solution is to simply create blank items in a new Wikibase, from 1 to the maximum identifier used in my old wikibase, and then repopulate each item with the claims from my old Wikibase instance. Unfortunately this is not a reliable solution because while Wikibase guarantees that item IDs will not be reused, it does not guarantee that every ID in the sequence will be created, e.g. in rare cases Wikibase may go from Q41 to Q43 and skip/never create Q42.
I don't mind that the identifier needs to be prefixed with a 'Q' or a 'P' for a particular type, I just want to be able to set the same identifier if I set up a new wikibase instance.
I think Wikibase is awesome, but it is an odd database that does not allow you to set the keys for the data you are managing :)
In reading through the Wikibase Repo code, it seems like this scenario was considered though perhaps isn't fully implemented (or has been disabled?). The code in EntitySavingHelper.php looks like there are/were ways to call it by providing an ID while still asking for a new entity, though there is logic earlier in the ModifyEntity code to look for and explicitly reject the case where the API asks for 'new' and also provides an ID, so I'm not sure how this code path would get called. There is also code to ask the entityStores if they 'canCreateWithCustomId', but those all appear to just return 'false'?
However, if that logic was skipped in the API handler and a bit of code reworked in ModifyEntity and EntitySavingHelper, along with ensuring that that the next available ID is kept up to date in the wb_id_counters table to always be 1 beyond the maximum ID in use, it looks like it might not be that hard to enable creating entities with specific IDs?
So three questions: Would the Wikibase development team ever be open to supporting something like this, behind a flag like $wgWBRepoSettings['allowUserProvidedIds'] that defaulted to false?
Are there more complicated implications from allowing a change like this that would need to be considered? I understand why the Wikidata.org repo needs this codepath fast and can't allow users to provide IDs for new entities anyway, but are there other reasons this isn't supported beyond "Wikidata doesn't need it?"
Is this all moot with the eventual REST API? I see that there's a PUT envisioned, could I use that to directly create an item or property and give it an ID then, or does the ID have to already exist to replace it?
I am happy to try to tackle creating a patch for this, but I'd like to get some feedback if there's any big lurking issues that I should know about before starting on the work - I'd rather not get deep into it only to find out it will never work or never be accepted. I'm also happy to shift this to phabricator if that's more appropriate.
Thank you all for your work on Wikibase!
Thanks,
-Erik _______________________________________________ Wikibaseug mailing list Wikibaseug@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibaseug
Wikibaseug mailing list Wikibaseug@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibaseug
Hey Erik,
We have just updated our guide on transferring data between wikibase instances. It is now complete. https://wikibase.consulting/transferring-wikibase-data-between-wikis/
That does not allow you to create new entities with a specific ID via the API though. Wikibase definitely supports creating such entities internally. We have some scripts for our customers that do exactly that. It seems like you already figured out where changes would be needed, and I agree that introducing a new option with a default value that does not change the current behavior makes sense. Some extra work would be needed to ensure creation of new items where no ID is specified (like on Special:NewItem) still works. But overall quite feasible. Still, someone has to make it happen :)
As to custom IDs (no Q or P prefix), there now is a phabricator ticket: https://phabricator.wikimedia.org/T271723
Best
-- Jeroen De Dauw | www.EntropyWins.wtf https://EntropyWins.wtf Professional wiki hosting and services: www.Professional.Wiki https://Professional.Wiki Entrepreneur | Software Architect | Open Source | Longtermism
wikibaseug@lists.wikimedia.org