Wikidata-tech January 2014

wikidata-tech@lists.wikimedia.org

14 participants
15 discussions

TravisCI integration with Gerrit
by Jeroen De Dauw 14 Jun '14

14 Jun '14

Hey Steffen and Andy, Continuing what I started on Twitter here, as some more characters might be helpful :) It seems that both our projects (FLOW3 and Wikidata) are in a similar situation. We are using Gerrit as CR tool, and TravisCI to run our tests. And we both want to have Travis run tests for all patchsets submitted to Gerrit, and then +1 or -1 on verified based on the build passing or failing. To what extend have you gotten such a thing to work on your project? Is there code available anywhere? If both projects can use the same code for this, I'd be happy to contribute to what you already have. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

2 2

Reusable serialization component
by Jeroen De Dauw 22 Feb '14

22 Feb '14

Hey, For some time now, it has been clear that a lot of people have use for the code that can serialize and deserialize the Wikibase DataModel. Anyone interacting with the API or the dumps and doing non-trivial things with the data benefits from being able to use the domain objects provided by the DataModel. While those themselves are reusable for some time already, the code responsible for serialization is not. As the involved code also suffers from serious design issues and is a sizeable chunk of our technical debt, the idea is to create a new shiny dedicated component that is not bound to Wikibase Repo or Wikibase Client. As interest has been expressed in contribution to this, I'll briefly outline the general idea, upcoming steps and relevant resources. If you are not interested in contributing, you can stop reading here :) I have created a new git repo for this component, which can be found here https://github.com/wmde/WikibaseDataModelSerialization This component should be to the DataModel component, what AskSerialization [0] is to the Ask library [1]. The approach and structure should be very similar. A few stubs have been added to illustrate how to organize the code, and autoloading and test bootsrap are in place, so the tests can be run by executing "phpunit" in the root directory. The existing legacy code for this serialization functionality can be found in Wikibase.git, in lib/serializers. I myself will be working on a very similar component which is aimed at solving the technical debt around the serialization code for the format used by the Wikibase Repo data access layer. This code resides in [2] and follows much the same approach as the new component for the serialization code dealing with the public format. Once the most critical issues this new code will solve are tackled, I will likely start work on the former component. Things to keep in mind when contributing to this component: * https://www.mediawiki.org/wiki/Wikibase/Coding_conventions * Almost no code should know concrete instances of serializers. Program against the interfaces. Ie, when the constructor of a serializer for a higher level object (ie SnakSerializer) takes a collaborator for a lower level one (ie DataValueSerializer), type hint against the interface (ie "Serializer $dataValueSerializer"). * Unit tests for all code should be provided. And round trip tests for matching serializers and deserializers. As well as high level serialization and deserialization integration tests. * Write clean tests, with descriptive method names, ie as done in https://github.com/wmde/WikibaseInternalSerialization/blob/master/tests/uni… [0] https://github.com/wmde/AskSerialization [1] https://github.com/wmde/Ask [2] https://github.com/wmde/WikibaseInternalSerialization Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

2 5

Report from the Architecture Summit
by Daniel Kinzler 31 Jan '14

31 Jan '14

(reposting, accidentally posted this to the internal list at first) Hey. Here's a brief summary of what I talked to folks in SF about, what the result was, or who we should contact to move forward. * At the architecture summit, there seemed to be wide agreement that we need to improve modularity in core. The TitleValue proposal was viewed as going to far to the "dark side of javafication", but it was generally seen to be moving in the right direction. I will update the change soon to address some comments. * Furthermore, we (the core developers) should see out service interfaces that can and should be factored out of existing classes, starting with pathological cases like EditPage, Title, or User. Several people agreed to look into that (and at the same time watch out to avoid "javafication"), Nik Everett vonunteered to lead the discussion. * Gabriel Wicke has interesting plans for factoring out storage services (both low level blob storage as well as higher level revision storage) into separate HTTP/REST services. * Jurik is working on a library/extension for JSON based configuration storage for extensions. Needs review/feedback, I'm looking into that. * I asked Aaron to provide a JobSpecification interface, so jobs can be scheduled without having to instantiate the class that will be used to execute the job. This makes it easier to post jobs from one wiki to another. Aaron has already implemented this now, yay! * Yurik wants us to rework the Wikibase API to be compatible with the core APIs "query" infrastructure. This would allow use to use item lists generated by one module as the input for another module. See https://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API * After talking to Chad, I'm now pretty sure we should go for ElasticSearch for implementing queries right away. It just seems a lot simpler than using MySQL for the baseline implementation. This however makes ElasticSearch a dependency of WikibaseQuery, making it harder for third parties to set up queries (though setting up Elastic seems pretty simple). * Brion would like to be in the loop on the PubsubHubbub project. For the operations side, and the question whether WMF would want to run their own hub, he pointed me to Ori and Mark Bergsma. * I didn't make progress wrt the JSON dumps. Need to get hold of Ariel, he wasn't around. We need to find out what makes the dumps so slow. Aaron Schulz agreed to help with that. One problematic aspect of the current implementation is that it tries to retrieve all entity IDs with a single DB query. We might need to chunk that. * For the future use of composer, we should be in touch with Markus Glaser and Hexmode (Mark Hershberger), as well as with Hashar. * Hashar is quite interested in switching to composer and perhaps also Travis. He was happy to hear that travis is Berlin based and sympathetic. The WMF might even be ready to invest a bit into making Travis work with our workflow. Hashar may come and visit us, poke him about it! * For access to the new log stash service, we should talk to Ken Snider * For shell access we should talk to Quim. * I discussed allowing queries on page_prove by property value with Tim as well as Roan. Tim suggested to add a pp_sortkey column to page_props (a float, but nullable), and index by pp_propname+pp_sortkey. That should cover most use cases nicely, without big schema changes. So, lots to follow up on! Cheers Daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

1 0

Adding configuration to WikibaseLib
by Jeroen De Dauw 30 Jan '14

30 Jan '14

Hey, It has long since been clear it is harmful to add configuration into WikibaseLib. It is a library, not an application, and its users might well want to use it with different config. This means that no additional entries should be added to WikibaseLib.default.php, and that commits that do should not be merged. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

2 2

Programmatically check for Wikibase installation
by Brueckner, Sebastian 23 Jan '14

23 Jan '14

Hey everyone, For writing our PubSubHubbub extension we need to handle a special case when a page is in fact not a MediaWiki article but instead a Wikibase entity. Is there a programmatic way to figure out, whether Wikibase is installed? The idea is to then use NamespaceUtils the same way Wikibase does to figure out what type the page is. Is this the best way to do that? BR, Sebastian and Alexander

2 1

Removal of manual classmaps and PHPUnit
by Jeroen De Dauw 21 Jan '14

21 Jan '14

Hey, The 3 components in Wikibase.git (Repo, Client and Lib) are the only ones left with manually maintained autoload classmaps. Time to get rid of them! I have two commits up for review, one killing the map for Client and one for Lib. * https://gerrit.wikimedia.org/r/#/c/108638/ * https://gerrit.wikimedia.org/r/#/c/108644/ The effect of these changes is that one has to trigger the rebuild of the classmaps whenever a class is added or moved. This can be done by running "composer update". This is also the case when doing a git pull that brings such changes. The same has been true for our other components for some time already, though there the inconvenience is mitigated by having the PHPUnit bootstrap file automatically run the "composer update" command. https://github.com/wmde/WikibaseDataModel/blob/master/tests/bootstrap.php Unfortunately we do not currently have a working boottsrap file in Wikibase. I poked at this though got a huge pile of errors for some to me unknown reason. https://gerrit.wikimedia.org/r/#/c/108642/ Ultimately it would be good to have the code in Wikibase.git be properly split and made PSR-0 compliant. If we use PSR-0 based loading, we will not have to deal with any classmap hassle. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

1 1

problem in API
by Amir Ladsgroup 21 Jan '14

21 Jan '14

Hello all, when I want to remove a references via wbremovereferences a hash is needed that can be obtained via wbgetclaims but there is no way to get hash of a qualifier via API, I wanted to add this to GetClaims.php (to return hash of qualifiers) but I couldn't understand the codes correctly Am I wrong? Best -- Amir

2 2

UI for site link badges
by Bene* 20 Jan '14

20 Jan '14

Hi devs, recently I had a discussion on IRC on how badges for site links should work. After some time I realized that there is a general design issue with their implementation. The original idea of badges is to replace the FA/GA templates on Wikipedias. This makes sense as it is centralized data which is used in every Wikipedia. On Wikidata we can place that the article about XXX on enwiki is a good/featured article and this will be shown on every Wikipedia having a corresponding article. So basically, we are just removing the {{Link GA|en}} or {{Link FA|en}} template and adding this information to Wikidata. You can see that this is the last step away from the old interwiki system. However, there are proposals to badges that go far beyond this original idea. Even the yet written implementation has such issues. As everybody will easily understand, an article can be either ga or fa or none. However, it cannot be ga *and* fa together. Thus one sitelink should only be connected to a maximum of one badge. In contrast, the implementation allows more than one badge, even worse, it allows an infinite number of badges. Other proposals want badges to support templates concerning meta information of the local article. However, this information do really *not* concern other Wikipedias. Or can you imagine a Wikipedia that wants to display a "this article needs sources" icon next to the (probably not existing) fa icon? This example makes clear that such meta information that has no value to another Wikipedia than itself should not be stored on Wikidata, too. Wikidata is a place for central data, not any data. For the Wikipedias' internal meta information the working categories work really fine today and there is no reason to force this system into Wikidata without any benefit to Wikidata or the Wikipedias themselves. So finally, my proposal is to only support one badge per sitelink. And also to only support badges that make sense to have at a central point and that will be used by more than one Wikipedia. I think this makes the ui easier to understand and the code not relying on super-huge config list showing all possible meta information that could be stored as badges resulting in another huge statement system, but even worse. Please keep this concerns in mind when deciding how the badges should work. Best regards, Bene*

4 7

Wikibase data exchange format
by Brueckner, Sebastian 17 Jan '14

17 Jan '14

Hey everyone, As we have just introduced ourselves on wikidata-l, we are currently working on a PubSubHubbub [1] extension [2] to MediaWiki. Currently, the extension only works on MediaWiki articles, not on Wikibase objects. For those articles we are using the wiki markup as exchange format (using URLs with action=raw), but currently there is no equivalent in Wikibase. We heard there’s work on a common JSON format that – in contrast to the current JSON format – will be standardized and could thus be used for our purpose. Are there plans to implement the new format any time soon and is that format the way to go for us? BR, Alexander and Sebastian. [1] http://code.google.com/p/pubsubhubbub<http://code.google.com/p/pubsubhubbub/>/<http://code.google.com/p/pubsubhubbub/> [2] http://www.mediawiki.org/wiki/Extension:PubSubHubbub

3 4

Versioning of components
by Jeroen De Dauw 16 Jan '14

16 Jan '14

Hey, I recently got some questions about how to version components. Happily enough someone just wrote a helpful blog post on the topic, which I recommend you have a look at to get a basic idea. http://blog.versioneye.com/2014/01/16/semantic-versioning/ The more verbose TL;DR version, which is also linked from this blog post, is the semver site: http://semver.org/ Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech January 2014