Hello all!
Once more, I'm taking a whack at implementing support for entity redirects - that is, we want Q1234 to become a redirect to Q2345 after we merged Q1234 into Q2345.
My question is how to best model this.
First off, a quick primer of the PHP classes involved:
* We have the Entity base class, with Item and Property deriving from it, for modeling entities.
* As "glue" for treating entities as MediaWiki page content, we have the EntityContent base class, and ItemContent and PropertyContent deriving from that (along with their handler/factory classes, EntityHandler, ItemHandler and PropertyHandler).
Presently, each type of Entity is assigned a MediaWiki namespace, and all pages in that namespace have the corresponding content type.
Now, there are various ways to model redirects. I have tried a few:
1) first I tried implementing redirects as a "mode" any Entity may have. The advantage is that redirects can seamlessly occur anywhere "real" entities can occur, e.g. in dumps, diffs, undo operations, etc. However, many operations that are defined on "real" entities are not defined on redirects (e.g. "set label", etc), so we would end up with a lot if "if redirect, do this, else, do that" checks all over the code base.
2) second I tried implementing redirects as a "mode" any EntityContent may have, following the idea that redirects are really a MediaWiki concept, and should be implemented outside the Wikibase data model. This still requires a lot of "if redirect, do this, else, do that" checks, but only in places dealing with EntityContent, not all places dealing with Entities.
3) third I tried using a separate entity type for representing redirects: a RedirectContent points to an EntityContent, there is no "mode", no need for extra checks, no chance for confusion. However, we break a few basic assumptions, most importantly the assumption that all pages in an entity namespace contain an EntityContent of the respective type.
None of these solutions seem satisfactory for the indicated reasons, and also some additional consideration explained below.
4) This lead me to come fully cycle and consider another option: make redirects a special *type* of Entity, besides Item and Property. This would again allow straight forward diffs (and thus undo-operations) between the redirected and the previous, un-redirected version of a page; Also, it would not compromise code that operates on Item on Property objects. But since there are still many operations defined for Entity that make no sense for redirects, we would need to insert another class into the hierarchy (let's call it LabeledEntity) which would be a base class for Item and Property, but not for Redirect.
This would require quite a bit of refactoring, and would introduce redirects as a "first order citizen" into the Wikibase data model.
Which of the four options do you prefer? Or can you think of another, better, option?
Below are some more points to consider when deciding on a design:
In a addition to the OO design questions outlined above, one important question is how to handle redirects in the wb_entity_per_page table; that table contains a 1:1 mapping of entity_id <-> page_id. It is used, among other things, for listing all entities and for checking whether an entity exists. HOw should redirects be represented in that table? They could be:
a) omitted (making it hard to list redirects), b) or use the redirect's page ID (maintaining the 1:1 nature, but pointing to a page that actually does not contain an entity), c) or use the target entities page ID (breaking the 1:1 nature of the table, but associating the ID with the accurate place).
For b) and c), a new column would be needed to mark redirects as such.
When deciding on how to represent redirects internally, we need to consider how they should be handled externally. Of course, when asking for an entity that got redirected, we should get the target entity's content, be it using the "regular" HTML page interface, the API, the linked data interface, etc.
In RDF dumps, we would probably want redirects to show up as owl:sameAs relations (or something similar that allows one URI to be indicated as the primary one). Should redirects also be present in JSON dumps? If so, should it also be possible to retrieve the JSON representation of entities from the API, in places where usually "real" entities would be?
-- daniel
Hey,
(Note: I am very jetlagged, so this is just my initial impression. I need to have a more thorough look at the involved things to provide a better solution.)
I agree that any "is redirect" flag in Entity is a bad idea.
This would require quite a bit of refactoring, and would introduce
redirects as a "first order citizen" into the Wikibase data model.
Since as you mention, the cost of this can be quite high, I'd like to verify what is actually really needed from a functionality point of view. Once that question has been settled we can determine which route to best take. Perhaps this is clear to you, though I'm unsure what exactly we hope to gain by having this in the data model itself.
In any case, I suggest we are cautious on which design changes and additions we make. In particular, we should not modify existing objects that suit the needs of their users into doing something that is less suitable, just because we have a new use case. Instead we should make an interface appropriate for that use case. Also, we should be careful not to mash concepts together and not to use the "is a" relationship between everything. Both these things have gone wrong for Entity originally.
And when making such changes, we should keep in mind our other requirements. I already wrote down quite a few points affecting Entity, some of which need to be tackled before we create any new implementation (ie QueryEntity) without introducing more technical debt.
a special *type* of Entity, besides Item and Property. This would again
allow straight forward diffs (and thus undo-operations) between the redirected and the previous, un-redirected version of a page
Can you explain this further? How does having this on the data model level, implemented as an Entity, help with diffs? Perhaps this is the case for EntityContent, though I suspect it is wrong for Entity.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --
Hi guys, is there any progress on this topic? I am interested how you will decide to implement the redirect feature and maybe also contribute to it if you have decided on the current design questions.
Best regards, Bene*
Hi,
Now, there are various ways to model redirects. I have tried a few:
- first I tried implementing redirects as a "mode" any Entity may have. The
advantage is that redirects can seamlessly occur anywhere "real" entities can occur, e.g. in dumps, diffs, undo operations, etc. However, many operations that are defined on "real" entities are not defined on redirects (e.g. "set label", etc), so we would end up with a lot if "if redirect, do this, else, do that" checks all over the code base.
I agree that this would bring lots of disadvantages to the current code base.
- second I tried implementing redirects as a "mode" any EntityContent may have,
following the idea that redirects are really a MediaWiki concept, and should be implemented outside the Wikibase data model. This still requires a lot of "if redirect, do this, else, do that" checks, but only in places dealing with EntityContent, not all places dealing with Entities.
In my opinion this would work better and I prefer it to #1. However, I guess the data won't be available in dumps etc.
- third I tried using a separate entity type for representing redirects: a
RedirectContent points to an EntityContent, there is no "mode", no need for extra checks, no chance for confusion. However, we break a few basic assumptions, most importantly the assumption that all pages in an entity namespace contain an EntityContent of the respective type.
Agreed this isn't the best idea.
None of these solutions seem satisfactory for the indicated reasons, and also some additional consideration explained below.
- This lead me to come fully cycle and consider another option: make redirects
a special *type* of Entity, besides Item and Property. This would again allow straight forward diffs (and thus undo-operations) between the redirected and the previous, un-redirected version of a page; Also, it would not compromise code that operates on Item on Property objects. But since there are still many operations defined for Entity that make no sense for redirects, we would need to insert another class into the hierarchy (let's call it LabeledEntity) which would be a base class for Item and Property, but not for Redirect.
This would require quite a bit of refactoring, and would introduce redirects as a "first order citizen" into the Wikibase data model.
Yes, this would require a bit lot of refactoring but it seems quite reasonable. I don't know if we should change the whole DataModel for this feature, but perhaps there is no solution that works better than this. The only one I can see is #2 but it has also some disadvantages.
Which of the four options do you prefer? Or can you think of another, better, option?
Below are some more points to consider when deciding on a design:
In a addition to the OO design questions outlined above, one important question is how to handle redirects in the wb_entity_per_page table; that table contains a 1:1 mapping of entity_id <-> page_id. It is used, among other things, for listing all entities and for checking whether an entity exists. HOw should redirects be represented in that table? They could be:
a) omitted (making it hard to list redirects), b) or use the redirect's page ID (maintaining the 1:1 nature, but pointing to a page that actually does not contain an entity), c) or use the target entities page ID (breaking the 1:1 nature of the table, but associating the ID with the accurate place).
For b) and c), a new column would be needed to mark redirects as such.
I would prefer b) as it does not break the idea of the table and makes also more sense if we consider handling redirects as entities.
This are my first impressions of entity redirects. I prefer the options 2 and 4 but don't have a strong opinion. Maybe I will also think of another option later.
Best regards, Bene*
I have created a page on metawiki to document this discussion and make it more available: https://meta.wikimedia.org/wiki/Wikidata/Development/Entity_redirect_after_m...
Best regards, Bene*
Hello all!
Once more, I'm taking a whack at implementing support for entity redirects - that is, we want Q1234 to become a redirect to Q2345 after we merged Q1234 into Q2345.
My question is how to best model this.
First off, a quick primer of the PHP classes involved:
- We have the Entity base class, with Item and Property deriving from it, for
modeling entities.
- As "glue" for treating entities as MediaWiki page content, we have the
EntityContent base class, and ItemContent and PropertyContent deriving from that (along with their handler/factory classes, EntityHandler, ItemHandler and PropertyHandler).
Presently, each type of Entity is assigned a MediaWiki namespace, and all pages in that namespace have the corresponding content type.
Now, there are various ways to model redirects. I have tried a few:
- first I tried implementing redirects as a "mode" any Entity may have. The
advantage is that redirects can seamlessly occur anywhere "real" entities can occur, e.g. in dumps, diffs, undo operations, etc. However, many operations that are defined on "real" entities are not defined on redirects (e.g. "set label", etc), so we would end up with a lot if "if redirect, do this, else, do that" checks all over the code base.
- second I tried implementing redirects as a "mode" any EntityContent may have,
following the idea that redirects are really a MediaWiki concept, and should be implemented outside the Wikibase data model. This still requires a lot of "if redirect, do this, else, do that" checks, but only in places dealing with EntityContent, not all places dealing with Entities.
- third I tried using a separate entity type for representing redirects: a
RedirectContent points to an EntityContent, there is no "mode", no need for extra checks, no chance for confusion. However, we break a few basic assumptions, most importantly the assumption that all pages in an entity namespace contain an EntityContent of the respective type.
None of these solutions seem satisfactory for the indicated reasons, and also some additional consideration explained below.
- This lead me to come fully cycle and consider another option: make redirects
a special *type* of Entity, besides Item and Property. This would again allow straight forward diffs (and thus undo-operations) between the redirected and the previous, un-redirected version of a page; Also, it would not compromise code that operates on Item on Property objects. But since there are still many operations defined for Entity that make no sense for redirects, we would need to insert another class into the hierarchy (let's call it LabeledEntity) which would be a base class for Item and Property, but not for Redirect.
This would require quite a bit of refactoring, and would introduce redirects as a "first order citizen" into the Wikibase data model.
Which of the four options do you prefer? Or can you think of another, better, option?
Below are some more points to consider when deciding on a design:
In a addition to the OO design questions outlined above, one important question is how to handle redirects in the wb_entity_per_page table; that table contains a 1:1 mapping of entity_id <-> page_id. It is used, among other things, for listing all entities and for checking whether an entity exists. HOw should redirects be represented in that table? They could be:
a) omitted (making it hard to list redirects), b) or use the redirect's page ID (maintaining the 1:1 nature, but pointing to a page that actually does not contain an entity), c) or use the target entities page ID (breaking the 1:1 nature of the table, but associating the ID with the accurate place).
For b) and c), a new column would be needed to mark redirects as such.
When deciding on how to represent redirects internally, we need to consider how they should be handled externally. Of course, when asking for an entity that got redirected, we should get the target entity's content, be it using the "regular" HTML page interface, the API, the linked data interface, etc.
In RDF dumps, we would probably want redirects to show up as owl:sameAs relations (or something similar that allows one URI to be indicated as the primary one). Should redirects also be present in JSON dumps? If so, should it also be possible to retrieve the JSON representation of entities from the API, in places where usually "real" entities would be?
-- daniel
wikidata-tech@lists.wikimedia.org