Resolving Redirects - Wikidata-tech

23 Jun 2014

      Hi all. I'm writing to get input on a conceptual issue regarding the resolution
of redirects.
I'm currently in the process of implementing redirects for Wikibase Items
(bugzilla 66067). My present task is to add support for redirect resolution to
the EntityLookup service interface (and possibly the related
EntityRevisionLookup service interface; bugzilla 66075).
Currently, the two interfaces in question look like this (with some irrelevant
stuff omitted):
interface EntityLookup {
    public function getEntity( EntityId $entityId, $revision = 0 );
    public function hasEntity( EntityId $entityId );
  }
interface EntityRevisionLookup extends EntityLookup {
    public function getEntityRevision( EntityId $entityId, $revisionId = 0 );
    public function getLatestRevisionId( EntityId $entityId );
  }
Note that getEntityRevision returns an EntityRevision object (an Entity with
some revision meta data), while getEntity just returns an Entity object.
Also note that the $revision parameter in EntityLookup::getEntity is deprecated
and being removed (see patch Iafdcb5b38), while $revision in
EntityRevisionLookup::getEntityRevision is supposed to stay.
Presently, the attempt to look up an Entity via an ID that has been turned into
a redirect will result in an exception being thrown. To implement redirect
resolution, original intention was to leave the EntityRevisionLookup as is, and
change EntityLookup like this:
interface EntityLookup {
    public function getEntity( EntityId $entityId, $resolveRedirects = 1 );
    public function hasEntity( EntityId $entityId, $resolveRedirects = 1 );
  }
...with the $resolveRedirects parameter indicating how many levels of redirects
should be resolved before giving up.
This gives use a convenient way to get the current revision of an entity,
following redirects; And it keeps the interface for requesting a specific, or
the latest, version of an Entity, with meta info attached.
However, it means we have to implement the logic for redirect resolution in
every implementation class, generally using the same code over and over (there
are currently three implementations of EntityRevisionLookup: the actual lookup,
a caching wrapper, and an in-memory fake).
Also, it does not give us a straight-forward way to get the meta-data of the
current revision while following redirects. For that, we'd have to modify
EntityRevisionLookup::getEntityReevision:
public function getEntityRevision(
        EntityId $entityId,
        $revisionId = 0,
        $resolveRedirects = 0
    );
This is ugly, and annoying since we'll want to *either* resolve redirects *or*
specify a revision. We could use a special value for $revisionId to indicate
that we not only want the current revision (indicated by 0), but also want to
have redirects resolved (indicated by "follow" or -1 or whatever):
public function getEntityRevision(
        EntityId $entityId,
        $revisionIdOrRedirects = 0,
    );
That's concise, but somewhat magical. Or we could add another method:
public function getEntityRevisionAfterFollowingAnyRedirects(
        EntityId $entityId,
        $resolveRedirects = 1,
    );
That's not quite obvious, and the awkward name indicates that this isn't really
what we want either.
Perhaps we can get around all this mess by making redirect resolution something
the interface doesn't know about? An implementation detail? The logic for
resolving redirects could be implemented in a Proxy/Wrapper that would implement
EntityRevisionLookup (and thus also EntityLookup). The logic would have to be
implemented only once, in one implementation class, that could be wrapped around
any other implementation.
...
From the implementation's point of view, this is a lot more elegant, and removes
all the issues of how to fit the flag for redirect resolution into the method
signatures.
However, this means that the caller does not have control over whether redirects
are resolved or not. It would then be the responsibility of bootstrap code to
provide an instances that does, or doesn't, do redirect resolution to the
appropriate places. That's impractical, since the decisions whether redirects
should be resolved may be dynamic (e.g. depend on a parameter in an web API
call), or the caller may wish to handle redirects explicitly, by first looking
up without redirect, and then with redirect resolution, after some special
treatment.
So, it seems that the "ugly" variant with an extra parameter in
getEntityRevision() is the most practical, even though it's not the most elegant
from an OO design perspective.
What's your take on this? Got any better ideas?
-- daniel
-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.