Wikidata-tech September 2013

wikidata-tech@lists.wikimedia.org

15 participants
20 discussions

Browsertests for Wikidata
by Tobi Gritschacher 13 Sep '13

13 Sep '13

Hey, I've updated the documentation on the Wikidata browsertests. See [1] for how to setup your system and run the tests using the "new" way with cucumber. If you have any questions don't hesitate asking me! cheers, tobi [1] https://meta.wikimedia.org/wiki/Wikidata/Development/Testing#Browser_Testin… -- Tobi Gritschacher Software Developer - Wikidata - http://www.wikidata.org Imagine a world, in which every single human being can freely share in the sum of all knowledge. That‘s our commitment. Wikimedia Deutschland e.V. | Obentrautstraße 72 | 10963 Berlin Phone +49 (0)30 219 158 260 http://wikimedia.de <http://www.wikimedia.de/> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

1 0

Re: [Wikidata-tech] [Wikidata-l] [Pywikipedia-l] wbsearchentities()
by Andrew Su 12 Sep '13

12 Sep '13

Since this is potentially a blocker on Chinmay's GSoC project, I'm cc'ing wikidata-tech... -andrew On Thu, Sep 12, 2013 at 3:12 AM, Merlijn van Deen <valhallasw(a)arctus.nl> wrote: > On 11 September 2013 20:31, Chinmay Naik <chin.naik26(a)gmail.com> wrote: >> >> hi, >> I am trying to retreive wikidata items by label through wbsearchentities() >> using the "continue" parameter from >> https://www.wikidata.org/wiki/Special:ApiSandbox#action=wbsearchentities&fo… >> Can i retreive more than 100 items using this? I notice the >> 'search-continue' returned by the search result disappears after 50 items. >> for ex >> https://www.wikidata.org/wiki/Special:ApiSandbox#action=wbsearchentities&fo… >> > > The api docs at https://www.wikidata.org/w/api.php explicitly state the > highest value for 'continue' is 50: > > limit - Maximal number of results > The value must be between 0 and 50 > Default: 7 > continue - Offset where to continue a search > The value must be between 0 and 50 > Default: 0 > > which indeed suggests there is a hard limit of 100 entries. Maybe someone in > the Wikidata dev team can explain the reason behind this? > > Merlijn > > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l >

1 0

BREAKING CHANGE: Wikidata API changing top upper-case IDs.
by Daniel Kinzler 12 Sep '13

12 Sep '13

Hi all. With today's deployment, the Wikibase API modules used on wikidata.org will change from using lower-case IDs (q12345) to upper-case IDs (Q12345). This is done for consistency with the way IDs are shown in the UI and used in URLs. The API will continue to accept entity IDs in lower-case as well as upper-case. Any bot or other client that has no property or item IDs hardcoded or configured in lower case should be fine. If however your code looks for some specific item or property in the output returned from the API, and it's using a lower-case ID to do so, it may now fail to match the respective ID. There is potential for similar problems with Lua code, depending on how the data structure is processed by Lua. We are working to minimize the impact there. Sorry for the short notice. Please test your code against test.wikidata.org and let us know if you find any issues. Thanks, Daniel PS: issue report on bugzilla: https://bugzilla.wikimedia.org/show_bug.cgi?id=53894

3 2

How to identify items in a dropdown box?
by Magnus Manske 09 Sep '13

09 Sep '13

Hi all, for my latest toy [1] I would really like to get the item IDs (Qxxx) for the items in the dropdown box when editing a claim that points to an item. I can get the item IDs for the search dropdown via the link hrefs, but didn't find anything similar in the "role='menuitem'" <li>s for the edit box. Is there a global data structure with the ID list I could access? Are the IDs attached to DOM nodes via jQuery data()? A hook I can add my function to? Or something else I could use? Alternatively, could you add an attribute (e.g. "itemid='Qxxx'") to the <li> nodes? Shouldn't be too hard, as it couldn't really break anything AFAICT. Could be really handy. Thanks, Magnus [1] http://magnusmanske.de/wordpress/?p=64

2 4

DataValues component naming
by Jeroen De Dauw 09 Sep '13

09 Sep '13

Hey, The commit moving the files in the DataValues repo around finally has been merged \o/. We can thus now create the 2 new git repos that still need to be created (the ValueView one is already there). We however still do not have settled on names for these. Currently we have: * DataValues(.git) (Composer: data-values/data-values): the DataValues interface and trivial implementations (ie BooleanValue). * DataValuesInterfaces(.git) (Composer: data-values/interfaces): ValueParser/Formatter/Validator interfaces and trivial implementations * DataValuesCommon(.git) (Composer: data-values/common): All currently existing non-trivial implementations of the interfaces defined by the above two packages that are not in Wikibase. Things might be split from this at a later point if we deem this to be of use (for instance having a DataValuesGeo data-values/geo with GlobeCoordinateValue, LatLongValue and all the parsing and formatting code). We will not be able to change these names without a lot of hassle later on, so if you do not agree with one of the later two, speak up now. Unless there are better suggestions, I'll be requesting repos with said names this weekend. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

4 7

Principles of Component Design
by Jeroen De Dauw 07 Sep '13

07 Sep '13

Hey, There recently has been quite some talk on having multiple components, including some concerns that simply having multiple components is in itself a bad idea. To those who are not familiar with the principles of component design, I can recommend the "Principles of Component Design" talk by Robert C. Martin [0]. While the whole talk is interesting and fun to watch, the first half contains a lot of historical details and other concerns not relevant to PHP development. So if you need to pick one half to watch, go for the second one. [0] https://vimeo.com/68236438 Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

1 0

UI for badges
by Michał Łazowik 06 Sep '13

06 Sep '13

Hi! I'm getting closer and closer to writing UI for badges in repo. Does anyone have any ideas? The only thing I came with is a additional row beneath each site link which behaves (and looks) like aliases editing currently does, but I think that this would take too much space. Thanks, Michał

4 12

IRI-value or string-value for URLs?
by Denny Vrandečić 03 Sep '13

03 Sep '13

We are planning to deploy URLs as data values rather soon (i.e. September 9, if all goes well). There was a discussion on wikidata-l mailing list: <http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg02664.html> The current implementation for URLs uses a string data value. There was also a IRI data value developed (for this use case), but in a previous (internal) discussion it was decided to use string value instead. The above thread included a few strong arguments by Markus for using the IRI data value. If we want to do this, we need to decide that very quickly, and change it accordingly. Let's see if we can make the decision here on this list. We need to make the decision by Monday latest, better earlier. Here are my current thoughts (check also the above mentioned thread if you did not have already). Currently I have a preference to using the string value, just to point out my current bias, but I want wider input. * I do not see the advantage of representing ' http://www.ietf.org/rfc/rfc1738.txt' as a structured data value of the form { protocol : 'http', hierarchicalpart : 'www.ietf.org/rfc/rfc1738.txt', query : '', fragment : '' }. * If we use string value, a number of necessary features come for free, like the diffing, displaying it in the diffs, etc. Sure, there is the argument that we can use the getString method for these, but then what is the use case that we actually serve by using the structured data? * I understood the advantages of being able to *identify* whether the value of a snak is a string or a URL, but that seems to be the same advantages as for knowing whether the value of a snak is a Commons media file name or a string. None of the the use cases though have been explaining why using the above data structure is advantageous over a simple string value. Please let us collect the arguments for and against using the IRI data value *structure* here (not for being able to *identify* whether a string is an IRI or a string). Not completely independent of that, there are a few questions that need to be answered but that are not as immediate, i.e. do not have to be decided by next week: * should, in the external JSON structure, for every snak the data value type be listed (as it currently is)? I.e. should it state "string" instead of "Commons media filename"? * should, in the external JSON structure, for every snak the data type of the property used be listed? This would then say URL, and this would solve all the use cases mentioned by Markus, which rely on *identifying* this distinction, not on the actual IRI data structure. * should, in the internal JSON structure, something be changed? The external JSON structure is the one used when communicating through the API. The internal JSON structure is the one that you get when using the dumps. We need to have an export of the whole Wikidata knowledge base in the external JSON format, rather sooner than later, and hopefully also in RDF. The lack of these dumps should not influence our decision right now, imho :) Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

5 13

Code quality metrics
by Jeroen De Dauw 02 Sep '13

02 Sep '13

Hey, Replying to DanielKs comment on the "Improvements to the implementation of Entity" thread, which has nothing to do with the topic of said thread, hence a new one. I got asked what "high risk" in my last email means. > > These results are using the CRAP metric [0] to determine the CRAPiness of > the code (which is indicated by the number in brackets). You can have > PHPUnit generate this by running the following command in the root > directory of WikibaseDataModel and clicking "dashboard" on the generated > index.html page: phpunit --coverage-html /some/path > > [0] http://googletesting.blogspot.de/2011/02/this-code-is-crap.html > I suggest to have this generated regularly and put it online somewhere (maybe on wikidata-docs.wikimedia.de). I suggested doing so many times in the past, though it turned out that maintaining such own infrastructure is to much work for our team. Our initial go at it is still online, even though it has been broken for over half a year: http://wikidata-docs.wikimedia.de/testcoverage/ http://wikidata-docs.wikimedia.de/testcoverage/phpcoverage/20130320/extensi… If we can get such a thing up and running again, and not have it break every week, that'd be awesome. Going with a service we do not need to maintain ourselves does seem to be the more sustainable approach though. Coveralls.io [0] support is already a big step in that direction, though this service unfortunately does not provide these CRAP reports at this time. There are several other interesting metrics we could track, some of which make good candidates for CI rules (ie commits with new methods with a complexity above n get a -1). There are some people (of course I forgot who) currently looking at PHPCS [1] integration into WMF Jenkins, which is a promising development. [0] https://coveralls.io/r/wikimedia [1] http://pear.php.net/package/PHP_CodeSniffer Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

1 0

Improvements to the implementation of Entity
by Jeroen De Dauw 02 Sep '13

02 Sep '13

Hey, Now the design issues of EntityId have been fixed, it's Entity's turn :) (Note: this is about domain layer implementation details. Not considering changing anything visible to the user here.) While working on the QueryEntity together with addshore, we ran into a number of issues with the current implementation of Entity. The main problem is that Entity objects are constructed from their internal "serialization". The constructor, which is marked as protected, takes in this "serialization" (in array form). This is rather awkward, consider how we now typically construct a Property: $property = Property::newEmpty(); $property->setDataTypeId( $id ); (We also have a static newFromDataTypeId which wraps this.) There is a bunch of code that assumes one can create empty Entity objects. Esp in tests. I now think it was a mistake to allow this at all for Property, which should not be constructed without a dataTypeId. The same goes for QueryEntity, which should not be constructed without a Ask\Query. It'd be much nicer if people could just use the constructors of the objects and have these enforce the list of required parameters. They'd just take the actual objects and not serializations. $property = new Property( $id ); $queryEntity = new QueryEntity( $askQuery ); And since these things are enforced, one now gets back a string when calling getDataTypeId, and a Ask\Query when calling getQueryDefinition, rather then either that type or null. Serialization and deserialization code can also go into dedicated service objects. This is already done in QueryEntity, which is using the same serializers as the ones the web API will use, saving us implementation of a second format, which would not be of much help here anyway (I'd save some disk space...). There is also a lot of room to be more strict about things. Right now you can happily construct a Entity that has ints as aliases, or as language code for labels. On top of that, there are currently still TODOs from the first months of the project in Entity related to normalization and handling of duplicates. We might want to clearly define responsibilities at this point :) Oh and, of course Entity, Item, Property and Query each should go into their own git repo. Any objections or concerns about the above rambling? There lately has also been some talking about doing things with Entity that we did not consider before. Such as entities that contain other entities. Is there a list of such thoughts? If not, lets compile one here so these can be held into account. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

2 3

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech September 2013