Hi everyone,
This is an important message to all the people running external tools.
About 2 years ago we said we want to get rid of the "wb_entity_per_page" table. This is still the case and is becoming more pressing now with the introduction of support for lexicographical data.
On July 12th, we are going to stop updating the "wb_entity_per_page" table from the Wikibase database and stop its replication on ToolLabs. At a later point we will remove it completely.
"wb_entity_per_page" was a secondary database table, mapping Wikibase entity IDs (e.g. "Q42") to MediaWiki page IDs (e.g. 138, which can be seen at https://www.wikidata.org/wiki/Q42?action=info). "wb_entity_per_page" stored entity IDs as numbers, while page titles are always full entity IDs.
This mapping existed because Wikibase was designed with the possibility to have entity pages where the ID does not match the title. This idea was never used, and finally removed in 2015 (documented in https://phabricator.wikimedia.org/T95685). We decided to get rid of the table because it contains outdated information that could mislead users, it costs resources and could conflict in the future with our new entity types for lexicographical data.
Please check if you are maintaining any code that accesses the "wb_entity_per_page" table, and replace it with lookups to MediaWiki's "page" and "redirect" tables.
We will drop the replica of the table on ToolLabs for test.wikidata.org on June 28th. We will do the same for wikidata.org on July 12th.
If you have any question or issue, feel free to add a comment on the ticket (http://phabricator.wikimedia.org/T95685) or to ping me. Thanks for your understanding
Cheers Lydia
So I'll be busy finding uses of this table, and changing them, for the next week or two...
Note that since now I have to use substring comparisons in queries (instead of integer comparisons), for example with wb_terms, SQL queries will run slower as a result.
On Thu, Jun 1, 2017 at 2:08 PM Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hi everyone,
This is an important message to all the people running external tools.
About 2 years ago we said we want to get rid of the "wb_entity_per_page" table. This is still the case and is becoming more pressing now with the introduction of support for lexicographical data.
On July 12th, we are going to stop updating the "wb_entity_per_page" table from the Wikibase database and stop its replication on ToolLabs. At a later point we will remove it completely.
"wb_entity_per_page" was a secondary database table, mapping Wikibase entity IDs (e.g. "Q42") to MediaWiki page IDs (e.g. 138, which can be seen at https://www.wikidata.org/wiki/Q42?action=info). "wb_entity_per_page" stored entity IDs as numbers, while page titles are always full entity IDs.
This mapping existed because Wikibase was designed with the possibility to have entity pages where the ID does not match the title. This idea was never used, and finally removed in 2015 (documented in https://phabricator.wikimedia.org/T95685). We decided to get rid of the table because it contains outdated information that could mislead users, it costs resources and could conflict in the future with our new entity types for lexicographical data.
Please check if you are maintaining any code that accesses the "wb_entity_per_page" table, and replace it with lookups to MediaWiki's "page" and "redirect" tables.
We will drop the replica of the table on ToolLabs for test.wikidata.org on June 28th. We will do the same for wikidata.org on July 12th.
If you have any question or issue, feel free to add a comment on the ticket (http://phabricator.wikimedia.org/T95685) or to ping me. Thanks for your understanding
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Am 01.06.2017 um 16:07 schrieb Magnus Manske:
So I'll be busy finding uses of this table, and changing them, for the next week or two...
Note that since now I have to use substring comparisons in queries (instead of integer comparisons), for example with wb_terms, SQL queries will run slower as a result.
wb_terms will also get a column with the full entity ID, and will lose the numeric ID column. We will announce this separately.
Magnus, can you give some examples of what you use wb_entity_per_page for? If it's for building links, you can now use the IDs directly, without looking at the page table. Are there other things besides the terms table that would use substrings?
Thanks!
As an example from my BEACON tool, I want all properties that have a formatter property, with English label. That SQL is now:
SELECT DISTINCT page_title,term_text FROM pagelinks,page,wb_terms WHERE page_namespace=120 AND substr(page_title,2)=term_entity_id and term_entity_type='property' and term_language='en' and term_type='label' and pl_from=page_id and pl_title='P1630' and pl_namespace=120 and pl_from_namespace=120 ORDER BY term_text
Note the "substr". My first attempt was "page_title=concat('Q',term_entity_id)", but that took forever.
If we indeed get a full entity ID=page title column for wb_terms, and for wb_items_per_site etc., that would at least fix the on-the-fly compute. I shall thus wait with code updates until I get the full story, and not just piece-by-piece...
On Thu, Jun 1, 2017 at 3:29 PM Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 01.06.2017 um 16:07 schrieb Magnus Manske:
So I'll be busy finding uses of this table, and changing them, for the
next week
or two...
Note that since now I have to use substring comparisons in queries
(instead of
integer comparisons), for example with wb_terms, SQL queries will run
slower as
a result.
wb_terms will also get a column with the full entity ID, and will lose the numeric ID column. We will announce this separately.
Magnus, can you give some examples of what you use wb_entity_per_page for? If it's for building links, you can now use the IDs directly, without looking at the page table. Are there other things besides the terms table that would use substrings?
Thanks!
-- Daniel Kinzler Principal Platform Engineer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Am 01.06.2017 um 16:59 schrieb Magnus Manske:
As an example from my BEACON tool, I want all properties that have a formatter property, with English label. That SQL is now:
SELECT DISTINCT page_title,term_text FROM pagelinks,page,wb_terms WHERE page_namespace=120 AND substr(page_title,2)=term_entity_id and term_entity_type='property' and term_language='en' and term_type='label' and pl_from=page_id and pl_title='P1630' and pl_namespace=120 and pl_from_namespace=120 ORDER BY term_text
Note the "substr". My first attempt was "page_title=concat('Q',term_entity_id)", but that took forever.
If we indeed get a full entity ID=page title column for wb_terms, and for wb_items_per_site etc., that would at least fix the on-the-fly compute. I shall thus wait with code updates until I get the full story, and not just piece-by-piece...
There is currently no plan to put the full ID into wb_items_per_site or wb_property_info, because these tables are bound to a specific entity type. Whether we want to do this would be a whole new discussion.
For what you are doing there, it's probably a lot easier to use the query service. SPARQL:
SELECT DISTINCT ?property ?propertyLabel WHERE { ?property a wikibase:Property . ?property wdt:P1630 ?format . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }
The original code predated SPARQL, so I have to change it anyway. The example I gave is small enough for SPARQL, but others will not be.
On Thu, Jun 1, 2017 at 4:11 PM Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 01.06.2017 um 16:59 schrieb Magnus Manske:
As an example from my BEACON tool, I want all properties that have a
formatter
property, with English label. That SQL is now:
SELECT DISTINCT page_title,term_text FROM pagelinks,page,wb_terms WHERE page_namespace=120 AND substr(page_title,2)=term_entity_id and term_entity_type='property' and term_language='en' and term_type='label'
and
pl_from=page_id and pl_title='P1630' and pl_namespace=120 and pl_from_namespace=120 ORDER BY term_text
Note the "substr". My first attempt was
"page_title=concat('Q',term_entity_id)",
but that took forever.
If we indeed get a full entity ID=page title column for wb_terms, and for wb_items_per_site etc., that would at least fix the on-the-fly
compute. I
shall thus wait with code updates until I get the full story, and not
just
piece-by-piece...
There is currently no plan to put the full ID into wb_items_per_site or wb_property_info, because these tables are bound to a specific entity type. Whether we want to do this would be a whole new discussion.
For what you are doing there, it's probably a lot easier to use the query service. SPARQL:
SELECT DISTINCT ?property ?propertyLabel WHERE { ?property a wikibase:Property . ?property wdt:P1630 ?format . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } }
-- Daniel Kinzler Principal Platform Engineer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
wikidata-tech@lists.wikimedia.org