Wikidata-tech February 2014

wikidata-tech@lists.wikimedia.org

12 participants
11 discussions

TravisCI integration with Gerrit
by Jeroen De Dauw 14 Jun '14

14 Jun '14

Hey Steffen and Andy, Continuing what I started on Twitter here, as some more characters might be helpful :) It seems that both our projects (FLOW3 and Wikidata) are in a similar situation. We are using Gerrit as CR tool, and TravisCI to run our tests. And we both want to have Travis run tests for all patchsets submitted to Gerrit, and then +1 or -1 on verified based on the build passing or failing. To what extend have you gotten such a thing to work on your project? Is there code available anywhere? If both projects can use the same code for this, I'd be happy to contribute to what you already have. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

2 2

How does Wikidata work?
by Sherlock Holmes 28 Feb '14

28 Feb '14

Hello everybody! Glad to join the team! I'm a freshman to use the Wikidata, and face some confusions: 1. I though Wikidata gathered data from Wikipedia infobox, but I also find that, the statements and infobox are not entirely corresponded. Some infobox data are missed in statements and the Statements also have more data than infobox. Then how does Wikidata actually create a new item? What's the detailed process? If there are new or updated articles in Wikipedia, how does Wikidata catch these updates? 2. I think there are more valuable information in Wikipedia but not included in Wikidata, e.g. categories, infobox data. How can I enrich the Wikidata? I know I may use & follow WikiData Bot's process, any directions? Any suggestions or guidance are welcome! BRs, Sherlock

1 0

Dumpfile inconsistency
by Fredo Erxleben 27 Feb '14

27 Feb '14

Hey there, I stumbled upon an inconsistency when parsing the dumpfile JSON: In item Q58404 (Haaften) the aliases are an empty array, as in "aliases":[] The same holds for Q15982 (Wernigerode), wich also has no aliases and therefor an empty array. In item Q189889 (Chicago) the aliases are an object: "aliases":{…} The same is the case for Q42 (Douglas Adams). Which one should it be now? I suspect there is an error during the writing of the dumpfiles… -- Fredo

3 3

wbsetclaim
by JWbot 26 Feb '14

26 Feb '14

Hi! I can't figure out how a claim can be created with the wbsetclaim API module. It would be convenient to have a single implementation for creating and updating. Please help me. Regards, Zoltán L.

6 21

Is altitude in globe coordinates deprecated?
by Fredo Erxleben 26 Feb '14

26 Feb '14

Hey there, When I parse the JSON from the dumpfiles, I find coordinates like {"latitude":51.835, "longitude":10.785277777778, "altitude":null, "precision":0.00027777777777778, "globe":"http:\/\/www.wikidata.org\/entity\/Q2"} However the altitude-key is not represented in the WDTK data model. Is this key deprecated (i.e. should I ignore it when converting dump-JSON to WDTK-objects)? Cheers, Fredo PS. First usable dumpfile-converter class version sighted at the horizon.

2 3

Version numbers, we are still doing it somewhat wrong
by Jeroen De Dauw 24 Feb '14

24 Feb '14

Hey, We've been mostly following http://semver.org/ for our new components, which is great. One thing we have however not been doing well is bumping to 1.0 at the appropriate time. Major version zero (0.y.z) is for initial development. Anything may change > at any time. The public API should not be considered stable. > In other words, y denotes a breaking change, and z does not. This is shifted from the "normal" x.y.z, where x>0, in which case all breaking changes result in increment of x, feature additions result in increment of y and fixes of z. Hence if we wrongly stick around at 0.y.z, than we are also wrongly using the rest of the numbering at that point. So when should the bump to 1.0 happen? Version 1.0.0 defines the public API. The way in which the version number > is incremented after this release is dependent on this public API and how > it changes. > How should I deal with revisions in the 0.y.z initial development phase? > > The simplest thing to do is start your initial development release at > 0.1.0 and then increment the minor version for each subsequent release. > How do I know when to release 1.0.0? > > If your software is being used in production, it should probably already > be 1.0.0. If you have a stable API on which users have come to depend, you > should be 1.0.0. If you're worrying a lot about backwards compatibility, > you should probably already be 1.0.0. > Doesn't this discourage rapid development and fast iteration? > > Major version zero is all about rapid development. If you're changing the > API every day you should either still be in version 0.y.z or on a separate > development branch working on the next major version. > Most of our components are used in production and have a relatively stable API, yet they still have a version in the 0.y.z format. Let's pay more attention to this for our new components, so they don't end up in the same situation. For the existing ones we can increment the first number rather than the second one the next instance where we would otherwise increment the second one. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

2 1

Reusable serialization component
by Jeroen De Dauw 22 Feb '14

22 Feb '14

Hey, For some time now, it has been clear that a lot of people have use for the code that can serialize and deserialize the Wikibase DataModel. Anyone interacting with the API or the dumps and doing non-trivial things with the data benefits from being able to use the domain objects provided by the DataModel. While those themselves are reusable for some time already, the code responsible for serialization is not. As the involved code also suffers from serious design issues and is a sizeable chunk of our technical debt, the idea is to create a new shiny dedicated component that is not bound to Wikibase Repo or Wikibase Client. As interest has been expressed in contribution to this, I'll briefly outline the general idea, upcoming steps and relevant resources. If you are not interested in contributing, you can stop reading here :) I have created a new git repo for this component, which can be found here https://github.com/wmde/WikibaseDataModelSerialization This component should be to the DataModel component, what AskSerialization [0] is to the Ask library [1]. The approach and structure should be very similar. A few stubs have been added to illustrate how to organize the code, and autoloading and test bootsrap are in place, so the tests can be run by executing "phpunit" in the root directory. The existing legacy code for this serialization functionality can be found in Wikibase.git, in lib/serializers. I myself will be working on a very similar component which is aimed at solving the technical debt around the serialization code for the format used by the Wikibase Repo data access layer. This code resides in [2] and follows much the same approach as the new component for the serialization code dealing with the public format. Once the most critical issues this new code will solve are tackled, I will likely start work on the former component. Things to keep in mind when contributing to this component: * https://www.mediawiki.org/wiki/Wikibase/Coding_conventions * Almost no code should know concrete instances of serializers. Program against the interfaces. Ie, when the constructor of a serializer for a higher level object (ie SnakSerializer) takes a collaborator for a lower level one (ie DataValueSerializer), type hint against the interface (ie "Serializer $dataValueSerializer"). * Unit tests for all code should be provided. And round trip tests for matching serializers and deserializers. As well as high level serialization and deserialization integration tests. * Write clean tests, with descriptive method names, ie as done in https://github.com/wmde/WikibaseInternalSerialization/blob/master/tests/uni… [0] https://github.com/wmde/AskSerialization [1] https://github.com/wmde/Ask [2] https://github.com/wmde/WikibaseInternalSerialization Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

2 5

Taking apart the "ref"-section of the dump file format
by Fredo Erxleben 21 Feb '14

21 Feb '14

Hello everybody, Since I am working on the conversion from the dump files to the wdtk data model, I will have to take apart the "refs" section of the JSON representing the stored items. Now a "refs"-section most likely looks like this: (Tried to format it for readability) "refs": [ [ [ "value",248, "wikibase-entityid",{"entity-type":"item","numeric-id":15241312} ], [ "value",577, "time",{"time":"+00000002013-10-28T00:00:00Z","timezone":0,"before":0,"after":0,"precision":11,"calendarmodel":"http:\/\/www.wikidata.org\/entity\/Q1985727"} ] ] ] So I figured out the following: The outer array groups all references. The second level array groups information about one reference (so if we had multiple references, we would also have multiple second-level arrays) and the inner arrays each group one specific information about one specific reference (as determined by the second level array they are nested in). Am I correct so far? The integer following the "value"-string denotes what the following information is about. So if I read that the value is 577, I know that there must be a specification of time, don't I? Is there any specific reason, why this is done in array form and not in a JSON object? (Since the "value"-key is always there one could know from its value, what other keys must be available.) If yes, why mention the type of information (e.g. "time") again? Am I overlooking something? Thanks so far, -- Fredo Erxleben

2 1

Re: [Wikidata-tech] How would statement groups look in internal json?
by Markus Krötzsch 19 Feb '14

19 Feb '14

[Moving discussion to wikidata-tech, since it is of general interest. The question was how the grouped statements seen in the UI are computed from the flat list of statements found in the dumps] Hi Fredo, StatementGroups are a new structure that we decided to introduce to the data model last week (and which did not exist so far). They represent statements that are grouped by property, like it can be seen in the user interface. The JSON generated by the Web API also groups statements by property. As you noticed, the internal JSON format found in the dumps does not have such groups -- there is just a list of statements there, which may not even have statements of the same property next to each other. The order of groups shown in the UI and the API results is computed from this internal list. Having StatementGroups as an explicit construct will allow the order of the groups to be cntrolled in a saner way. Currently the groups are created from the plain list by putting the group for a property "at the first position where a statement of this property occurs in the list". More algorithmically: * Input: a list of statements * Output: a list of lists of statements * result = [] // empty list * Iterate over all statements in the input: ** if property of statement has no group in result yet: *** create new group and append it to the result ** add the statement to its group The statements within each group are reordered in the UI, so that the ones with preferred rank are on top, followed by the ones of normal rank, followed by the ones of deprecated rank. The relative order of statements with the same rank should be preserved in this operation. We do not have a representation for these rank-based groups in the data model (it seemed not as critical as the StatementGroups), so the statements in each group should just be kept in their original order. This is also important for moving groups, since the group can currently only be moved by moving its first statement in the internal dump list (which may not have the highest rank). This will hopefully at some point get easier with a new API action for moving groups. Reordering statements into groups as described above is not significant for query answering or for display -- in fact Wikibase does this internally on some edits. So this preprocessing does not really change the data. In the future, I hope that also the rank-based ordering is maintained within statement groups. I hope this answers the original question :-) Cheers, Markus On 19/02/14 18:33, Fredo Erxleben wrote: > Hello everybody, > > I encountered a question when trying to take apart statements: > > So far I encountered JSON looking like this: > "claims": > [ > { > "m":["value",107,"wikibase-entityid",{"entity-type":"item","numeric-id":618123}], > > "q":[], > "g":"q58404$773743FD-0D4F-48D0-ACAE-028B5D37CA25", > "rank":1, > "refs":[[["value",143,"wikibase-entityid",{"entity-type":"item","numeric-id":10000}]]] > > }, > … … … > { > "m":["value",646,"string","\/m\/0f0_k6"], > "q":[], > "g":"Q58404$12E30FC1-F54F-44D4-91AD-99C453935FD4", > "rank":1, > "refs":[[["value",248,"wikibase-entityid",{"entity-type":"item","numeric-id":15241312}],["value",577,"time",{"time":"+00000002013-10-28T00:00:00Z","timezone":0,"before":0,"after":0,"precision":11,"calendarmodel":"http:\/\/www.wikidata.org\/entity\/Q1985727"}]]] > > } > ] > > which is just one StatementGroup with some statements, as far as I > understand it. > Now, how would it look, if the Item has multiple statement groups? then > the value to the "claim" - key would not be an array of statements but > an array of arrays of statements? Or do have all Items have only one > statement group? > > Kind regards > -- Fredo >

1 0

Badges
by Bene* 06 Feb '14

06 Feb '14

Hi guys, I added a comment on bug 40810 but I also want to send it here to reach more people. We still need to discuss how the badges should work in general. At the development plan [1] it says "Each sitelink can have zero or one badge attached to it." This does not match the current implementation which allows multiple badges. So the implementation needs to be changed to only supporting one badge. However, if we decide to keep the option of multiple badges we face some other problems. First, the ui gets harder to be implemented on Wikidata. This is not a real problem though. But a real problem is that we cannot determine which badge we want to display on the client. The only option I see to solve this issue is either to create another config variable (very ugly) or to only allow one badge as stated in the development plan anyway. Best regards, Bene* [1] https://www.wikidata.org/wiki/Wikidata:Development_plan

5 12

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech February 2014