Hello all!
Once more, I'm taking a whack at implementing support for entity redirects - that is, we want Q1234 to become a redirect to Q2345 after we merged Q1234 into Q2345.
My question is how to best model this.
First off, a quick primer of the PHP classes involved:
* We have the Entity base class, with Item and Property deriving from it, for modeling entities.
* As "glue" for treating entities as MediaWiki page content, we have the EntityContent base class, and ItemContent and PropertyContent deriving from that (along with their handler/factory classes, EntityHandler, ItemHandler and PropertyHandler).
Presently, each type of Entity is assigned a MediaWiki namespace, and all pages in that namespace have the corresponding content type.
Now, there are various ways to model redirects. I have tried a few:
1) first I tried implementing redirects as a "mode" any Entity may have. The advantage is that redirects can seamlessly occur anywhere "real" entities can occur, e.g. in dumps, diffs, undo operations, etc. However, many operations that are defined on "real" entities are not defined on redirects (e.g. "set label", etc), so we would end up with a lot if "if redirect, do this, else, do that" checks all over the code base.
2) second I tried implementing redirects as a "mode" any EntityContent may have, following the idea that redirects are really a MediaWiki concept, and should be implemented outside the Wikibase data model. This still requires a lot of "if redirect, do this, else, do that" checks, but only in places dealing with EntityContent, not all places dealing with Entities.
3) third I tried using a separate entity type for representing redirects: a RedirectContent points to an EntityContent, there is no "mode", no need for extra checks, no chance for confusion. However, we break a few basic assumptions, most importantly the assumption that all pages in an entity namespace contain an EntityContent of the respective type.
None of these solutions seem satisfactory for the indicated reasons, and also some additional consideration explained below.
4) This lead me to come fully cycle and consider another option: make redirects a special *type* of Entity, besides Item and Property. This would again allow straight forward diffs (and thus undo-operations) between the redirected and the previous, un-redirected version of a page; Also, it would not compromise code that operates on Item on Property objects. But since there are still many operations defined for Entity that make no sense for redirects, we would need to insert another class into the hierarchy (let's call it LabeledEntity) which would be a base class for Item and Property, but not for Redirect.
This would require quite a bit of refactoring, and would introduce redirects as a "first order citizen" into the Wikibase data model.
Which of the four options do you prefer? Or can you think of another, better, option?
Below are some more points to consider when deciding on a design:
In a addition to the OO design questions outlined above, one important question is how to handle redirects in the wb_entity_per_page table; that table contains a 1:1 mapping of entity_id <-> page_id. It is used, among other things, for listing all entities and for checking whether an entity exists. HOw should redirects be represented in that table? They could be:
a) omitted (making it hard to list redirects), b) or use the redirect's page ID (maintaining the 1:1 nature, but pointing to a page that actually does not contain an entity), c) or use the target entities page ID (breaking the 1:1 nature of the table, but associating the ID with the accurate place).
For b) and c), a new column would be needed to mark redirects as such.
When deciding on how to represent redirects internally, we need to consider how they should be handled externally. Of course, when asking for an entity that got redirected, we should get the target entity's content, be it using the "regular" HTML page interface, the API, the linked data interface, etc.
In RDF dumps, we would probably want redirects to show up as owl:sameAs relations (or something similar that allows one URI to be indicated as the primary one). Should redirects also be present in JSON dumps? If so, should it also be possible to retrieve the JSON representation of entities from the API, in places where usually "real" entities would be?
-- daniel