I think option #3 is best, since the table isn't *that* big (only one row
per page) and seems least bad option in terms of code design.
On Mon, Jul 14, 2014 at 12:30 PM, Daniel Kinzler <
This is about the question of how to best store entity
redirects in the
database. Below I try to describe the problem and the possible solutions.
A quick primer:
Wikibase redirects are really Entity-ID aliases. They correspond to, but
the same as, MediaWiki's page based redirects.
If Q3 is a redirect to (an alias for) Q5, the page Item:Q3 is also a
Item:Q5. The JSON blob on Item:Q3 would store a redirect entry *instead*
entity. Entities never *are* redirects.
Wikibase currently stores a mapping of entity ids to page ids in the
wb_entity_per_page table (epp table for short).
MediaWiki core stores redirects as a kind of "link table", with rd_from
page_id, and rd_to+rd_namespace being name+namespace of the redirect
* When looking up an EntityId, we need to be able to load the
blob, and for that we need to find the corresponding wiki page (either by
by name+namespace). We need to be able to do this cross-wiki, so we may
the repo's configuration (wrt namespaces, etc) available when constructing
* We need an efficient way to list all entity IDs on a wiki (without
In particular, the mechanism for listing entities must support efficient
* We need an efficient way to resolve redirects in bulk, or at least, to
redirects from unknown/deleted entity ids.
1) No redirects in the epp table (current). This means we need to use the
name+namespace when loading the entity-or-redirect from a page, since we
know the page ID if it's a redirects. We also can't use core's redirect
because for that, we also need to know the page id first. In order to use
name+namespace for looking up page IDs for entities, client wikis would
know the namespace IDs used on the repo, in order to generate queries
the repo's database.
2) Put redirects into the epp table as well, without any special marking.
makes lookups easy, but gives us no efficient way to list all entities
redirects. We'd need to check and skip redirects while iterating. This
complexity to several maintenance and upgrade scripts.
3) Put redirects into the epp table, with a marker (or target id) in a new
column. This would allow for both, simple lookup and efficient listing,
means adding a column (and an index) to an already large table in
also means having the overhead of a column that's mostly null.
4) Put redirects into epp *and* a separate table. Provides simple lookup,
means a potentially slow join when listing entities. This join would happen
multiple times each time we need to list all entities, because of paged
compare how JsonDumpGenerator works.
5) Put redirects into a special table but not into epp. This means
listing of entities, but requires a not-so-nice "try" logic when looking up
entities: if no entry is found in the epp table, we then need to go on and
the entity-redirect table, to see whether the id is redirected or
1) is nasty in terms of cross-wiki configuration. It's the simplest
the code and database levels, but seems brittle.
2) adds complexity to everything that lists entities. Big performance
cases where entity blobs would otherwise not have been loaded, but are
now to check whether they contain redirects.
3) is somewhat wasteful on the database level, and needs a schema change
deployment on a large table. Don't know how bad that would be, though.
4) may cause performance issues because it adds complexity to big queries
large tables. Needs trivial schema change deployment (new table).
5) adds complexity to the code that reads entity blobs from the database,
impacts performance for the "redirect" and "missing entity" cases by
database query. Could be acceptable if these cases are rare. Needs trivial
schema change deployment (new table).
Senior Software Developer
Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-tech mailing list
Wikimedia Germany e.V. | Tempelhofer Ufer 23-24, 10963 Berlin
Phone (030) 219 158 26-0
Wikimedia Germany - Society for the Promotion of free knowledge eV Entered
in the register of Amtsgericht Berlin-Charlottenburg under the number 23
855 as recognized as charitable by the Inland Revenue for corporations I
Berlin, tax number 27/681/51985.