Hey,

With the recent EntityId refactoring, the assumption that these IDs consist out of a prefix (ie Q) and a numeric part (ie 42) can be removed from the EntityId class. Except that this assumption is still present in several other locations in the codebase.

If we want to support entity ids that do not follow the prefix+numeric id schema (for instance using a filename or word as id), then we need to get rid of the occurrences of this assumption. This is not trivial to do however. The biggest hurdle is that we have a lot of entity ids stored in the db using 2 fields: entity type (string) and numeric id (int). The former field being mapped from the id prefix, and the later being the numeric part. As long as this is there, we need to be able to get a numeric part from the id object, and reconstruct id objects given a type and numeric part. Removing the format assumption would mean having the same entity type field, but rather then having an int field holding just part of the id, there would be a string field that holds the whole serialization. While it is technically not that hard to change this in the software, the size of the tables of Wikidata.org makes doing a rebuild somewhat hard.

This means we essentially need to choose if we want to be able to, at any point, have entity ids that do not follow the prefix+number format. If this is not important, we can leave everything as it is, and keep the assumption around without feeling dirty about it. In that case we also accept the fact that if later on we want an id that does not follow this format, we'll be out of luck. If on the other hand we want to support ids with other formats, we need to start working on getting rid of the remaining assumptions on the old format, and accept that we'll need to write some rebuilding code for a (few) Wikidata.org table(s).

It'd be good to have this decision sooner then later, as code touching places where such assumptions are located needs to oddly hold both possible decisions into account, while both typically suggest a quit different approach.

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--