Hi all!
The Wikibase team would like to allow data from any item to be used on any client page. To do this, we need to track which item is being used where, so we can purge the appropriate pages when the item changes. We would like to have people with database experience to look at our proposal and let us know about any concerns, especially wrt performance.
Here you find a proposal for two database tables for tracking the usage of entities across wikis:
https://gerrit.wikimedia.org/r/#/c/158078/9/usagetracking/includes/Usage/Sql...
https://gerrit.wikimedia.org/r/#/c/158078/9/subscription/includes/Subscripti...
The "entity_usage" table would be on every client, recording wich entity is used on which page (kind of like the iwlinks table). The "entity_per_client" table would be on the repo, and track which wiki ("client") is interested in changes to which entity.
Please have a look and let me know if you have any questions or suggestions, especially with regards to the following use cases:
The following would happpen when editing/re-parsing a page on a client wiki (e.g. wikipedia): * get all entities used on a given page from entity_usage * delete rows based on a page id and a list of entity ids from entity_usage * insert rows for a page / entity pair into entity_usage * queriy rows for a set of entities from entity_usage (with no page id specified). * add rows for a set of (newly used) entites to the entity_per_client table * remove rows for a set of (no longer used) entites from the entity_per_client table
The following would happen when dispatching a change from wikibase: * looking up interested wikis for a list of entities from the entity_per_client table. * (notification via the job queue) * looking up pages to be purged/updated based on a list of entity ids (and possibly an aspect id) in the entity_usage table.
-- daniel