Hello all!
Once more, I'm taking a whack at implementing support for entity redirects -
that is, we want Q1234 to become a redirect to Q2345 after we merged Q1234 into
Q2345.
My question is how to best model this.
First off, a quick primer of the PHP classes involved:
* We have the Entity base class, with Item and Property deriving from it, for
modeling entities.
* As "glue" for treating entities as MediaWiki page content, we have the
EntityContent base class, and ItemContent and PropertyContent deriving from that
(along with their handler/factory classes, EntityHandler, ItemHandler and
PropertyHandler).
Presently, each type of Entity is assigned a MediaWiki namespace, and all pages
in that namespace have the corresponding content type.
Now, there are various ways to model redirects. I have tried a few:
1) first I tried implementing redirects as a "mode" any Entity may have. The
advantage is that redirects can seamlessly occur anywhere "real" entities can
occur, e.g. in dumps, diffs, undo operations, etc. However, many operations that
are defined on "real" entities are not defined on redirects (e.g. "set label",
etc), so we would end up with a lot if "if redirect, do this, else, do that"
checks all over the code base.
2) second I tried implementing redirects as a "mode" any EntityContent may have,
following the idea that redirects are really a MediaWiki concept, and should be
implemented outside the Wikibase data model. This still requires a lot of "if
redirect, do this, else, do that" checks, but only in places dealing with
EntityContent, not all places dealing with Entities.
3) third I tried using a separate entity type for representing redirects: a
RedirectContent points to an EntityContent, there is no "mode", no need for
extra checks, no chance for confusion. However, we break a few basic
assumptions, most importantly the assumption that all pages in an entity
namespace contain an EntityContent of the respective type.
None of these solutions seem satisfactory for the indicated reasons, and also
some additional consideration explained below.
4) This lead me to come fully cycle and consider another option: make redirects
a special *type* of Entity, besides Item and Property. This would again allow
straight forward diffs (and thus undo-operations) between the redirected and the
previous, un-redirected version of a page; Also, it would not compromise code
that operates on Item on Property objects. But since there are still many
operations defined for Entity that make no sense for redirects, we would need to
insert another class into the hierarchy (let's call it LabeledEntity) which
would be a base class for Item and Property, but not for Redirect.
This would require quite a bit of refactoring, and would introduce redirects as
a "first order citizen" into the Wikibase data model.
Which of the four options do you prefer? Or can you think of another, better,
option?
Below are some more points to consider when deciding on a design:
In a addition to the OO design questions outlined above, one important question
is how to handle redirects in the wb_entity_per_page table; that table contains
a 1:1 mapping of entity_id <-> page_id. It is used, among other things, for
listing all entities and for checking whether an entity exists. HOw should
redirects be represented in that table? They could be:
a) omitted (making it hard to list redirects),
b) or use the redirect's page ID (maintaining the 1:1 nature, but pointing to a
page that actually does not contain an entity),
c) or use the target entities page ID (breaking the 1:1 nature of the table, but
associating the ID with the accurate place).
For b) and c), a new column would be needed to mark redirects as such.
When deciding on how to represent redirects internally, we need to consider how
they should be handled externally. Of course, when asking for an entity that got
redirected, we should get the target entity's content, be it using the "regular"
HTML page interface, the API, the linked data interface, etc.
In RDF dumps, we would probably want redirects to show up as owl:sameAs
relations (or something similar that allows one URI to be indicated as the
primary one). Should redirects also be present in JSON dumps? If so, should it
also be possible to retrieve the JSON representation of entities from the API,
in places where usually "real" entities would be?
-- daniel
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.