Wikidata-tech July 2014

wikidata-tech@lists.wikimedia.org

9 participants
6 discussions

Ensuring correctness of compatibility ranges

by Jeroen De Dauw

Hey, By default our CI will run tests using the latest stable dependencies of the component under test. Sometimes you also want to also verify it still works with older versions, so you don't accidentally break compatibility. I just did this for DataModelSerialization, to be able to spot compat issues with the dev version of DataModel 1.0. https://github.com/wmde/WikibaseDataModelSerialization/blob/master/.travis.… So this now runs the tests with DataModel 0.7, the latest release to match ~0.7, and the latest commit on the 1.0.x branch. Cheers -- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3

9 years, 9 months

How to record redirects in the database

by Daniel Kinzler

This is about the question of how to best store entity redirects in the database. Below I try to describe the problem and the possible solutions. Any input welcome. A quick primer: Wikibase redirects are really Entity-ID aliases. They correspond to, but are not the same as, MediaWiki's page based redirects. If Q3 is a redirect to (an alias for) Q5, the page Item:Q3 is also a redirect to Item:Q5. The JSON blob on Item:Q3 would store a redirect entry *instead* of an entity. Entities never *are* redirects. Wikibase currently stores a mapping of entity ids to page ids in the wb_entity_per_page table (epp table for short). MediaWiki core stores redirects as a kind of "link table", with rd_from being a page_id, and rd_to+rd_namespace being name+namespace of the redirect target. Requirements: * When looking up an EntityId, we need to be able to load the corresponding JSON blob, and for that we need to find the corresponding wiki page (either by id, or by name+namespace). We need to be able to do this cross-wiki, so we may not have the repo's configuration (wrt namespaces, etc) available when constructing the query. * We need an efficient way to list all entity IDs on a wiki (without redirects). In particular, the mechanism for listing entities must support efficient paging. * We need an efficient way to resolve redirects in bulk, or at least, to discern redirects from unknown/deleted entity ids. Options: 1) No redirects in the epp table (current). This means we need to use the name+namespace when loading the entity-or-redirect from a page, since we don't know the page ID if it's a redirects. We also can't use core's redirect table, because for that, we also need to know the page id first. In order to use name+namespace for looking up page IDs for entities, client wikis would need to know the namespace IDs used on the repo, in order to generate queries against the repo's database. 2) Put redirects into the epp table as well, without any special marking. This makes lookups easy, but gives us no efficient way to list all entities without redirects. We'd need to check and skip redirects while iterating. This would add complexity to several maintenance and upgrade scripts. 3) Put redirects into the epp table, with a marker (or target id) in a new column. This would allow for both, simple lookup and efficient listing, but it means adding a column (and an index) to an already large table in production. It also means having the overhead of a column that's mostly null. 4) Put redirects into epp *and* a separate table. Provides simple lookup, but means a potentially slow join when listing entities. This join would happen multiple times each time we need to list all entities, because of paged access - compare how JsonDumpGenerator works. 5) Put redirects into a special table but not into epp. This means fast/simple listing of entities, but requires a not-so-nice "try" logic when looking up entities: if no entry is found in the epp table, we then need to go on and try the entity-redirect table, to see whether the id is redirected or unknown/deleted. Assessment: 1) is nasty in terms of cross-wiki configuration. It's the simplest solution on the code and database levels, but seems brittle. 2) adds complexity to everything that lists entities. Big performance impact in cases where entity blobs would otherwise not have been loaded, but are loaded now to check whether they contain redirects. 3) is somewhat wasteful on the database level, and needs a schema change deployment on a large table. Don't know how bad that would be, though. 4) may cause performance issues because it adds complexity to big queries on large tables. Needs trivial schema change deployment (new table). 5) adds complexity to the code that reads entity blobs from the database, impacts performance for the "redirect" and "missing entity" cases by adding a database query. Could be acceptable if these cases are rare. Needs trivial schema change deployment (new table). -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

9 years, 10 months

reviews needed for pubsubhubbub extension

by Lydia Pintscher

Hey folks :) Wikimedia Germany has been working with a team of students over the past months. They have among other things developed a pubsubhubbub extension. The idea is that we allow 3rd parties to easily subscribe to changes made on Wikidata or any other wiki. Wikidata's changes get send to a hub which then notifies all subscribers who are interested in the change. We'd like to get this extension deployed for Wikidata and possibly later other Wikimedia projects. For this they need some more eyes to review them. RobLa suggested I send an email to wikitech-l about it. The review bugs for this are https://bugzilla.wikimedia.org/show_bug.cgi?id=67117 and https://bugzilla.wikimedia.org/show_bug.cgi?id=67118 Thanks for your help getting this ready for deployment. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

9 years, 10 months

PubSubHubbub what is it all about ? - Re: Wikidata-tech Digest, Vol 15, Issue 3

by Param

Hi, I am new member to wikidata and would like to know all about “PubSubHubbub” the new project. It will be great if I know how to take participate, get involve. Regards, Pramod On Jul 9, 2014, at 8:02 AM, wikidata-tech-request(a)lists.wikimedia.org wrote: >> PubSubHubbub

9 years, 10 months

upcoming API change (wblinktitle)

by Lydia Pintscher

Hey folks :) Next week we will deploy an API change in preparation for support for Wikimedia Commons. wblinktitle will return a prefixed ID starting Tuesday night if there are no complications: https://bugzilla.wikimedia.org/show_bug.cgi?id=52732 This is necessary as Commons support will break the assumption that our identifiers are always of the form someletter+number like for example Q1234. Sorry for the disruption. If you need help with adapting your tools please let me know. Cheers Lydia PS: Jeroen forced me to attach a cat to this email so here you go: /\_/\ =( °w° )= ) ( // (__ __)// -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

9 years, 10 months

reviews needed for upcoming query functionality for Wikidata

by Lydia Pintscher

Hi folks, The Wikidata dev team has been working on simple query functionality for Wikibase over the past months. It's now in a state where we would like to get it reviewed and deployed. RobLa suggested I email wikitech-l to get some more eyes on it. Once deployed users will be able to search for single property-value pairs and get a list of results - via the API and a special page. Example: Give me all items that have a statement "producer: J.J. Abrams". The bug reports for performance and security review are at: * https://bugzilla.wikimedia.org/show_bug.cgi?id=67533 * https://bugzilla.wikimedia.org/show_bug.cgi?id=67534 * https://bugzilla.wikimedia.org/show_bug.cgi?id=67535 * https://bugzilla.wikimedia.org/show_bug.cgi?id=67536 Thanks for your help getting this ready for deployment. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

9 years, 10 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech July 2014