Hey,
By default our CI will run tests using the latest stable dependencies of
the component under test. Sometimes you also want to also verify it still
works with older versions, so you don't accidentally break compatibility. I
just did this for DataModelSerialization, to be able to spot compat issues
with the dev version of DataModel 1.0.
https://github.com/wmde/WikibaseDataModelSerialization/blob/master/.travis.…
So this now runs the tests with DataModel 0.7, the latest release to match
~0.7, and the latest commit on the 1.0.x branch.
Cheers
--
Jeroen De Dauw - http://www.bn2vs.com
Software craftsmanship advocate
Evil software architect at Wikimedia Germany
~=[,,_,,]:3
This is about the question of how to best store entity redirects in the
database. Below I try to describe the problem and the possible solutions. Any
input welcome.
A quick primer:
Wikibase redirects are really Entity-ID aliases. They correspond to, but are not
the same as, MediaWiki's page based redirects.
If Q3 is a redirect to (an alias for) Q5, the page Item:Q3 is also a redirect to
Item:Q5. The JSON blob on Item:Q3 would store a redirect entry *instead* of an
entity. Entities never *are* redirects.
Wikibase currently stores a mapping of entity ids to page ids in the
wb_entity_per_page table (epp table for short).
MediaWiki core stores redirects as a kind of "link table", with rd_from being a
page_id, and rd_to+rd_namespace being name+namespace of the redirect target.
Requirements:
* When looking up an EntityId, we need to be able to load the corresponding JSON
blob, and for that we need to find the corresponding wiki page (either by id, or
by name+namespace). We need to be able to do this cross-wiki, so we may not have
the repo's configuration (wrt namespaces, etc) available when constructing the
query.
* We need an efficient way to list all entity IDs on a wiki (without redirects).
In particular, the mechanism for listing entities must support efficient paging.
* We need an efficient way to resolve redirects in bulk, or at least, to discern
redirects from unknown/deleted entity ids.
Options:
1) No redirects in the epp table (current). This means we need to use the
name+namespace when loading the entity-or-redirect from a page, since we don't
know the page ID if it's a redirects. We also can't use core's redirect table,
because for that, we also need to know the page id first. In order to use
name+namespace for looking up page IDs for entities, client wikis would need to
know the namespace IDs used on the repo, in order to generate queries against
the repo's database.
2) Put redirects into the epp table as well, without any special marking. This
makes lookups easy, but gives us no efficient way to list all entities without
redirects. We'd need to check and skip redirects while iterating. This would add
complexity to several maintenance and upgrade scripts.
3) Put redirects into the epp table, with a marker (or target id) in a new
column. This would allow for both, simple lookup and efficient listing, but it
means adding a column (and an index) to an already large table in production. It
also means having the overhead of a column that's mostly null.
4) Put redirects into epp *and* a separate table. Provides simple lookup, but
means a potentially slow join when listing entities. This join would happen
multiple times each time we need to list all entities, because of paged access -
compare how JsonDumpGenerator works.
5) Put redirects into a special table but not into epp. This means fast/simple
listing of entities, but requires a not-so-nice "try" logic when looking up
entities: if no entry is found in the epp table, we then need to go on and try
the entity-redirect table, to see whether the id is redirected or unknown/deleted.
Assessment:
1) is nasty in terms of cross-wiki configuration. It's the simplest solution on
the code and database levels, but seems brittle.
2) adds complexity to everything that lists entities. Big performance impact in
cases where entity blobs would otherwise not have been loaded, but are loaded
now to check whether they contain redirects.
3) is somewhat wasteful on the database level, and needs a schema change
deployment on a large table. Don't know how bad that would be, though.
4) may cause performance issues because it adds complexity to big queries on
large tables. Needs trivial schema change deployment (new table).
5) adds complexity to the code that reads entity blobs from the database,
impacts performance for the "redirect" and "missing entity" cases by adding a
database query. Could be acceptable if these cases are rare. Needs trivial
schema change deployment (new table).
-- daniel
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Hey folks :)
Wikimedia Germany has been working with a team of students over the
past months. They have among other things developed a pubsubhubbub
extension. The idea is that we allow 3rd parties to easily subscribe
to changes made on Wikidata or any other wiki. Wikidata's changes get
send to a hub which then notifies all subscribers who are interested
in the change.
We'd like to get this extension deployed for Wikidata and possibly
later other Wikimedia projects. For this they need some more eyes to
review them. RobLa suggested I send an email to wikitech-l about it.
The review bugs for this are
https://bugzilla.wikimedia.org/show_bug.cgi?id=67117 and
https://bugzilla.wikimedia.org/show_bug.cgi?id=67118
Thanks for your help getting this ready for deployment.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Hi,
I am new member to wikidata and would like to know all about “PubSubHubbub” the new project.
It will be great if I know how to take participate, get involve.
Regards,
Pramod
On Jul 9, 2014, at 8:02 AM, wikidata-tech-request(a)lists.wikimedia.org wrote:
>> PubSubHubbub
Hey folks :)
Next week we will deploy an API change in preparation for support for
Wikimedia Commons. wblinktitle will return a prefixed ID starting Tuesday
night if there are no complications:
https://bugzilla.wikimedia.org/show_bug.cgi?id=52732 This is necessary as
Commons support will break the assumption that our identifiers are always
of the form someletter+number like for example Q1234.
Sorry for the disruption. If you need help with adapting your tools please
let me know.
Cheers
Lydia
PS: Jeroen forced me to attach a cat to this email so here you go:
/\_/\
=( °w° )=
) ( //
(__ __)//
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
Hi folks,
The Wikidata dev team has been working on simple query functionality
for Wikibase over the past months. It's now in a state where we would
like to get it reviewed and deployed. RobLa suggested I email
wikitech-l to get some more eyes on it. Once deployed users will be
able to search for single property-value pairs and get a list of
results - via the API and a special page. Example: Give me all items
that have a statement "producer: J.J. Abrams".
The bug reports for performance and security review are at:
* https://bugzilla.wikimedia.org/show_bug.cgi?id=67533
* https://bugzilla.wikimedia.org/show_bug.cgi?id=67534
* https://bugzilla.wikimedia.org/show_bug.cgi?id=67535
* https://bugzilla.wikimedia.org/show_bug.cgi?id=67536
Thanks for your help getting this ready for deployment.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.