Wikidata-tech December 2013

wikidata-tech@lists.wikimedia.org

10 participants
11 discussions

TravisCI integration with Gerrit
by Jeroen De Dauw 14 Jun '14

14 Jun '14

Hey Steffen and Andy, Continuing what I started on Twitter here, as some more characters might be helpful :) It seems that both our projects (FLOW3 and Wikidata) are in a similar situation. We are using Gerrit as CR tool, and TravisCI to run our tests. And we both want to have Travis run tests for all patchsets submitted to Gerrit, and then +1 or -1 on verified based on the build passing or failing. To what extend have you gotten such a thing to work on your project? Is there code available anywhere? If both projects can use the same code for this, I'd be happy to contribute to what you already have. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

2 2

Reusable serialization component
by Jeroen De Dauw 23 Feb '14

23 Feb '14

Hey, For some time now, it has been clear that a lot of people have use for the code that can serialize and deserialize the Wikibase DataModel. Anyone interacting with the API or the dumps and doing non-trivial things with the data benefits from being able to use the domain objects provided by the DataModel. While those themselves are reusable for some time already, the code responsible for serialization is not. As the involved code also suffers from serious design issues and is a sizeable chunk of our technical debt, the idea is to create a new shiny dedicated component that is not bound to Wikibase Repo or Wikibase Client. As interest has been expressed in contribution to this, I'll briefly outline the general idea, upcoming steps and relevant resources. If you are not interested in contributing, you can stop reading here :) I have created a new git repo for this component, which can be found here https://github.com/wmde/WikibaseDataModelSerialization This component should be to the DataModel component, what AskSerialization [0] is to the Ask library [1]. The approach and structure should be very similar. A few stubs have been added to illustrate how to organize the code, and autoloading and test bootsrap are in place, so the tests can be run by executing "phpunit" in the root directory. The existing legacy code for this serialization functionality can be found in Wikibase.git, in lib/serializers. I myself will be working on a very similar component which is aimed at solving the technical debt around the serialization code for the format used by the Wikibase Repo data access layer. This code resides in [2] and follows much the same approach as the new component for the serialization code dealing with the public format. Once the most critical issues this new code will solve are tackled, I will likely start work on the former component. Things to keep in mind when contributing to this component: * https://www.mediawiki.org/wiki/Wikibase/Coding_conventions * Almost no code should know concrete instances of serializers. Program against the interfaces. Ie, when the constructor of a serializer for a higher level object (ie SnakSerializer) takes a collaborator for a lower level one (ie DataValueSerializer), type hint against the interface (ie "Serializer $dataValueSerializer"). * Unit tests for all code should be provided. And round trip tests for matching serializers and deserializers. As well as high level serialization and deserialization integration tests. * Write clean tests, with descriptive method names, ie as done in https://github.com/wmde/WikibaseInternalSerialization/blob/master/tests/uni… [0] https://github.com/wmde/AskSerialization [1] https://github.com/wmde/Ask [2] https://github.com/wmde/WikibaseInternalSerialization Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

2 5

Single repo status?
by Greg Grossmeier 25 Dec '13

25 Dec '13

Hi there, >From the October 24th Wikidata/WMF checkin, there's this note: == October 24 == * deployment from single repo - ok now? Can we get a commitment to try * this for the next deployment? ** Yes all good to go. Let's do this for the deployment on test on Oct 31st. What is the status of this? I don't see that (single WikiData repo manually manged by the wikidata team) in our production and labs configs. Why am I bringing this up now? Because there are planned changes to how WD will do dependencies (what are dependencies where, I guess) that will break the Beta Cluster if done without coordination. Jeroen is asking for help so he can work on these changes. The option proposed which I'm not happy about: turn off automatic WikiData updates on Beta Cluster. If this has to happen, then we need someone on WMF's side who's confident with Beta Cluster to be around/help. Antoine is out on vacation (it is Dec 24th ;) ), but maybe Timo could help? I'm not sure, honestly. The option I want to be done: actually implement the single/manual repo so that there aren't these breakages on Beta/Production. I don't want to do the first option because it is a stop-gap solution that is only necessary because the right thing hasn't been done (since at least Oct 24th?). What are the needed next steps from WikiData and WMF to finish this? I see on change from Katie (thanks Katie!): https://gerrit.wikimedia.org/r/#/c/95996/ * but unless I'm mistaken, that's just one piece of the puzzle. Is the single repo created in gerrit? If not, we (WMF) should get that done ASAP (considering holidays). Thanks all, especially Jeroen for helping clarify some things. Greg -- | Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E | | identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |

3 2

Wikibase DataModel released!
by Jeroen De Dauw 24 Dec '13

24 Dec '13

Hey, I’m happy to announce the 0.6 release of Wikibase DataModel. This is the first real release of this component. The awesome thing about this release is that it finally makes the component ready for third party usage. You can find a short description of Wikibase DataModel, some history and highlights for this release at http://www.bn2vs.com/blog/2013/12/23/wikibase-datamodel-released/ Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

1 0

Entity redirect final discussion
by Bene* 24 Dec '13

24 Dec '13

Hello guys, we have had a discussion about how entity redirects should be implemented and there is also an overview on meta.wikimedia.org [1]. Now the people I discussed with agreed that the second option [2] is the best one and that we should use it. However, in the recent week there did not come any more feedback so I suppose that everyone is fine with #2. My questions concern the further development process of this feature. If nobody has any objections within 24 hours I will add a note to the page and would like to start discussing in a more detailed way how the feature should be implemted. Furthermore I would also like to implement some first changes to speed up the fix of this change. Please provide me some feedback and enhancements how the next steps should look like and what has to be discussed before starting to write some code. Best regards, Bene [1] https://meta.wikimedia.org/wiki/Wikidata/Development/Entity_redirect_after_… [2] Implement redirects as a "mode" any *EntityContent* may have

2 1

Improving leaky test cases
by Daniel Kinzler 20 Dec '13

20 Dec '13

I'm about to sign off for the holidays, until January 6th, so here's a quick heads up: For investigating sporadic failures of test cases, I have created a branch of wikibase on github, which has travis set up for testing: https://github.com/wmde/Wikibase/tree/fixtravis https://travis-ci.org/wmde/Wikibase This branch contains quite a few fixes/improvements to test cases. It would be good to have them on gerrit soon. The following tests were identified of (probably) using hard coded entity IDs in an unhealthy way, but I didn't get around to fixing them yet: repo/tests/phpunit/includes/api/SetClaimTest.php repo/tests/phpunit/includes/api/SetQualifierTest.php repo/tests/phpunit/includes/api/SetReferenceTest.php repo/tests/phpunit/includes/api/SetSiteLinkTest.php They should probably be fixed along the same lines as MergeItemsTest, using the new EntityTestHelper::injectIds method to inject "real" ids for placeholders in the data the providers return. Cheers! Daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

2 1

RFC: How to Implement Entity Redirects?
by Daniel Kinzler 18 Dec '13

18 Dec '13

Hello all! Once more, I'm taking a whack at implementing support for entity redirects - that is, we want Q1234 to become a redirect to Q2345 after we merged Q1234 into Q2345. My question is how to best model this. First off, a quick primer of the PHP classes involved: * We have the Entity base class, with Item and Property deriving from it, for modeling entities. * As "glue" for treating entities as MediaWiki page content, we have the EntityContent base class, and ItemContent and PropertyContent deriving from that (along with their handler/factory classes, EntityHandler, ItemHandler and PropertyHandler). Presently, each type of Entity is assigned a MediaWiki namespace, and all pages in that namespace have the corresponding content type. Now, there are various ways to model redirects. I have tried a few: 1) first I tried implementing redirects as a "mode" any Entity may have. The advantage is that redirects can seamlessly occur anywhere "real" entities can occur, e.g. in dumps, diffs, undo operations, etc. However, many operations that are defined on "real" entities are not defined on redirects (e.g. "set label", etc), so we would end up with a lot if "if redirect, do this, else, do that" checks all over the code base. 2) second I tried implementing redirects as a "mode" any EntityContent may have, following the idea that redirects are really a MediaWiki concept, and should be implemented outside the Wikibase data model. This still requires a lot of "if redirect, do this, else, do that" checks, but only in places dealing with EntityContent, not all places dealing with Entities. 3) third I tried using a separate entity type for representing redirects: a RedirectContent points to an EntityContent, there is no "mode", no need for extra checks, no chance for confusion. However, we break a few basic assumptions, most importantly the assumption that all pages in an entity namespace contain an EntityContent of the respective type. None of these solutions seem satisfactory for the indicated reasons, and also some additional consideration explained below. 4) This lead me to come fully cycle and consider another option: make redirects a special *type* of Entity, besides Item and Property. This would again allow straight forward diffs (and thus undo-operations) between the redirected and the previous, un-redirected version of a page; Also, it would not compromise code that operates on Item on Property objects. But since there are still many operations defined for Entity that make no sense for redirects, we would need to insert another class into the hierarchy (let's call it LabeledEntity) which would be a base class for Item and Property, but not for Redirect. This would require quite a bit of refactoring, and would introduce redirects as a "first order citizen" into the Wikibase data model. Which of the four options do you prefer? Or can you think of another, better, option? Below are some more points to consider when deciding on a design: In a addition to the OO design questions outlined above, one important question is how to handle redirects in the wb_entity_per_page table; that table contains a 1:1 mapping of entity_id <-> page_id. It is used, among other things, for listing all entities and for checking whether an entity exists. HOw should redirects be represented in that table? They could be: a) omitted (making it hard to list redirects), b) or use the redirect's page ID (maintaining the 1:1 nature, but pointing to a page that actually does not contain an entity), c) or use the target entities page ID (breaking the 1:1 nature of the table, but associating the ID with the accurate place). For b) and c), a new column would be needed to mark redirects as such. When deciding on how to represent redirects internally, we need to consider how they should be handled externally. Of course, when asking for an entity that got redirected, we should get the target entity's content, be it using the "regular" HTML page interface, the API, the linked data interface, etc. In RDF dumps, we would probably want redirects to show up as owl:sameAs relations (or something similar that allows one URI to be indicated as the primary one). Should redirects also be present in JSON dumps? If so, should it also be possible to retrieve the JSON representation of entities from the API, in places where usually "real" entities would be? -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

3 4

Please put deployment notes into commit messages!
by Daniel Kinzler 13 Dec '13

13 Dec '13

We just discussed how to best make sure that any pre- or post-deployment tasks required by some core changes get done. We agreed on the following: If a change you submit to gerrit needs some pre- or post-deployment task to be performed (e.g. applying a database patch, changing a setting, purging some cache, etc), mention this in the commit message, using the DEPLOYMENT keyword (at the start of a line starting a new paragraph). E.g.: DEPLOYMENT: frob the xyzzy when deploying this change! I have written a brief deployment checklist that also reflects this new convention: https://meta.wikimedia.org/wiki/Wikidata/Deployment Please extend/modify as appropriate. -- daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

1 0

Re: [Wikidata-tech] [Wikitech-l] Help needed with ParserCache::getKey() and ParserCache::getOptionsKey()
by Daniel Kinzler 12 Dec '13

12 Dec '13

Am 10.12.2013 22:38, schrieb Brad Jorsch (Anomie): > Looking at the code, ParserCache::getOptionsKey() is used to get the > memc key which has a list of parser option names actually used when > parsing the page. So for example, if a page uses only math and > thumbsize while being parsed, the value would be array( 'math', > 'thumbsize' ). Am 11.12.2013 02:35, schrieb Tim Starling: > No, the set of options which fragment the cache is the same for all > users. So if the user language is included in that set of options, > then users with different languages will get different parser cache > objects. Ah, right, thanks! Got myself confused there. The thing is: we are changing what's in the list of relevant options. Before the deployment, there was nothing in it, while with the new code, the user language should be there. I suppose that means we need to purge these "pointers". Would bumping wgCacheEpoch be sufficient for that? Note that we don't care much about puring the actual parser cache entries, we want to purge the "pointer" entries in the cache. >> We just tried to enable the use of the parser cache for wikidata, and it failed, >> resulting in page content being shown in random languages. > > That's probably because you incorrectly used $wgLang or > RequestContext::getLanguage(). The user language for the parser is the > one you get from ParserOptions::getUserLangObj(). Oh, thanks for that hint! Seems our code is inconsistent about this, using the language from the parser options in some places, the one from the context in others. Need to fix that! > It's not necessary to call ParserOutput::recordOption(). > ParserOptions::getUserLangObj() will call it for you (via > onAccessCallback). Oh great, magic hidden information flow :) Thanks for the info, I'll get hacking on it! -- daniel

2 1

Help needed with ParserCache::getKey() and ParserCache::getOptionsKey()
by Daniel Kinzler 11 Dec '13

11 Dec '13

Hi. I (rather urgently) need some input from someone who understands how parser caching works. (Rob: please forward as appropriate). tl;dr: what is the intention behind the current implementation of ParserCache::getOptionsKey()? It's based on the page ID only, not taking into account any options. This seems to imply that all users share the same parser cache key, ignoring all options that may impact cached content. Is that correct/intended? If so, why all the trouble with ParserOutput::recordOption, etc? Background: We just tried to enable the use of the parser cache for wikidata, and it failed, resulting in page content being shown in random languages. I tried to split the parser cache by user language using ParserOutput:.recordOption to include userlang in the cache key. When tested locally, and also on our test system, that seemed to work fine (which seems strange now, looking at the code of getOptionsKey()). On the life site however, it failed. Judging by its name, getOptionsKey should generate a key that includes all options relevant to caching page content in the parser cache. But it seems it forces the same parser cache entry for all users. Is this intended? Possible fix: ParserCache::getOptionsKey could delegate to ContentHandler::getOptionsKey, which could then be used to override the default behavior. Would that be a sensible approach? And if so, would it be feasible to push out such a change before the holidays? Thanks, Daniel -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

2 1

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech December 2013