Wikidata-tech February 2016

wikidata-tech@lists.wikimedia.org

14 participants
10 discussions

by Markus Krötzsch

Hi, I found that Special:EntityData returns outdated JSON data that is not in agreement with the page. I have fetched the data using wget to ensure that no browser cache is in the way. Concretely, I have been looking at https://www.wikidata.org/wiki/Special:EntityData/Q17444909.json where I recently changed the P279 value from Q217594 to Q16889133. Of course, this might no longer be a valid example when you read this email (in case the cache gets updated at some point). Is this a bug in the configuration of the HTTP (or other) cache, or is this the desired behaviour? When will the cache be cleared? Thanks, Markus

8 years, 1 month

Wikibase CI is broken because auf Scribunto issue

by Daniel Kinzler

Some Jenkins jobs now fail for all changes to Wikibase. E.g. <https://gerrit.wikimedia.org/r/#/c/270008/> and <https://gerrit.wikimedia.org/r/#/c/270572/>. Errors I see: 11:28:52 PHP Strict standards: Declaration of Capiunto\Test\BasicRowTest::testLua() should be compatible with Scribunto_LuaEngineTestBase::testLua($key, $testName, $expected) in /mnt/jenkins-workspace/workspace/mwext-testextension-php55-composer/src/extensions/Capiunto/tests/phpunit/output/BasicRowTest.php on line 51 11:39:14 1) LuaSandbox: Wikibase\Client\Tests\DataAccess\Scribunto\Scribunto_LuaWikibaseEntityLibraryTest::testRegister 11:39:14 Failed asserting that LuaSandboxFunction Object () is an instance of class "Scribunto_LuaStandaloneInterpreterFunction". I guess some change to Scribunto broke compatibility... -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

8 years, 2 months

Re: [Wikidata-tech] Wikidata-tech Digest, Vol 34, Issue 7

by Jorge Hernandez

Good morning, Stas Malyshev & Magnus Manske , Thanks all for your prompt response and for sending me the related site resources. I have been reviewing the references sent and am very content with this path seeking experience. I am looking forward to the Geo support that will be added with the next Blazegraph update, AFAIK. Although, I know not much about how this update will facilitate for the retrieval of Geo attributes ( e.g., bounding coordinates, shapefile data, Woe id etc .) Either way, very interesting!, Great Thanks, Jorge Hernandez On Sat, Feb 20, 2016 at 4:00 AM, <wikidata-tech-request(a)lists.wikimedia.org> wrote: > Send Wikidata-tech mailing list submissions to > wikidata-tech(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/wikidata-tech > or, via email, send a message with subject or body 'help' to > wikidata-tech-request(a)lists.wikimedia.org > > You can reach the person managing the list at > wikidata-tech-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Wikidata-tech digest..." > > > Today's Topics: > > 1. Wikidata SPRQL (Jorge A. Hernandez) (Jorge Hernandez) > 2. Re: Wikidata SPRQL (Jorge A. Hernandez) (Magnus Manske) > 3. Re: Wikidata SPRQL (Jorge A. Hernandez) (Stas Malyshev) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 19 Feb 2016 11:43:52 -0800 > From: Jorge Hernandez <jorgeah2(a)uci.edu> > To: wikidata-tech(a)lists.wikimedia.org > Subject: [Wikidata-tech] Wikidata SPRQL (Jorge A. Hernandez) > Message-ID: > < > CAKRCeoqcti2aGOrUfxdEy7qbO1XzXEWc+xaJHcbcia7PpSVn3Q(a)mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Greeting Wiki Tinkers! > > I have been researching into Wikidata SPRQL, and am very much enjoying > the Wikidata Query Service API <https://query.wikidata.org/>, tool. After > some related research, I have managed to extract many related geo > attributes > of request records like ( e.g., cities in cali, parking in U.S. etc.) > alongside pertinent fields > like wikipedia site, GNSID, etc. > > Although, one problem that I am having occurs when I requesting "bounding > boxes" > (i.e., northeast, southwest) point coordinates per record feature. > In fact, I am able to get the pertinent point coordinates (lat and long), > but how do we get the bounding boxes? > > Is there a web location site where related queries are up for share? > Is there a Wikidata SPRQL prominent themed archive somewhere? > > Thanks All! > > Kindest Regards, > J.A. H. >

8 years, 2 months

Wikidata SPRQL (Jorge A. Hernandez)

by Jorge Hernandez

Greeting Wiki Tinkers! I have been researching into Wikidata SPRQL, and am very much enjoying the Wikidata Query Service API <https://query.wikidata.org/>, tool. After some related research, I have managed to extract many related geo attributes of request records like ( e.g., cities in cali, parking in U.S. etc.) alongside pertinent fields like wikipedia site, GNSID, etc. Although, one problem that I am having occurs when I requesting "bounding boxes" (i.e., northeast, southwest) point coordinates per record feature. In fact, I am able to get the pertinent point coordinates (lat and long), but how do we get the bounding boxes? Is there a web location site where related queries are up for share? Is there a Wikidata SPRQL prominent themed archive somewhere? Thanks All! Kindest Regards, J.A. H.

8 years, 2 months

Re: [Wikidata-tech] WDTK searchEntities

by Markus Krötzsch

[Moving to wikidata-tech; previous conversation inline below] Hi Polyglot, ah, now I see. The Wikidata Toolkit method you call is looking for items by Wikipedia page title, not for items by label. Labels and titles are not related in Wikidata. The search by title is supported by the wbgetentities API action for which we have a wrapper class, but this API action does not support the search by label. In fact, I am not sure that there is any API action for doing what you want. There is only wbsearchentities, but this search will return near matches and also look for aliases. Maybe this is not a big issue for long strings as in your case, but for shorter strings you would get many results and you would still need to check if they really match. Anyway, you are right that it would be nice if we would implement support for the label/alias search as well. For this, we need to make a wrapper class for wbsearchentities. I created an issue to track this: https://github.com/Wikidata/Wikidata-Toolkit/issues/228 Cheers, Markus On 13.02.2016 23:22, Jo wrote: > Hi Markus, > > I'm searching for a wikidata item with that label. It would be even > better if it were possible to search for a label/description combination. > > This is the item I'm looking for: > https://www.wikidata.org/wiki/Q22695926 > > I mostly want to make sure that I'm not creating duplicate entries in > Wikidata, most of those schools are not noteworthy enough to get an > article on Wikipedia, but since they have objects in Openstreetmap, I > would think they are interesting enough for Wikidata. > > Polyglot > > 2016-02-13 23:13 GMT+01:00 Markus Krötzsch > <markus(a)semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>: > > Hi Jo, > > You are searching for an item that is assigned to the article > "Kasega Church of Uganda Primary School" on English Wikipedia. > However, there is not article of this name on English Wikipedia. > Maybe there is a typo? Can you tell me which Wikidata item should be > returned here? > > Cheers, > > Markus > > P.S. If you agree, I would prefer to continue this discussion on > wikidata-tech for the benefit of others who may have similar questions. > > > > On 13.02.2016 14:47, Jo wrote: > > Hi Marcus, > > I had started to write my own implementation of a Wikidata bot in > Jython, so I could use it in JOSM, but still get to code in > Python. This > worked well for a while, but now apparently something was > changed to the > login API. > > Anyway, I can't code in all possible things that can go wrong, so it > makes more sense to reuse an existing framework. > > What I want to do is add items, but I want to check if they already > exist first. Try as I may, I can't seem to retrieve the items I > create > myself, like: > > > Kasega Church of Uganda Primary School > > Douglas Adams, on the other hand doesn't pose a problem. > > > I can't figure out why this is. Some things can be found, others > can't. > I tried with a few more entries from recent changes. > > > In my own bot, I had more succes with searchEntities than with > getEntities. Was this implemented in WDTK? > > I hope you can help, I'm stuck, as it doesn't make a lot of sense to > continue with the conversion, if I can't even get a trivial > thing like > this to work. > > from org.wikidata.wdtk.datamodel.helpers import Datamodel > from org.wikidata.wdtk.datamodel.helpers import ItemDocumentBuilder > from org.wikidata.wdtk.datamodel.helpers import ReferenceBuilder > from org.wikidata.wdtk.datamodel.helpers import StatementBuilder > from org.wikidata.wdtk.datamodel.interfaces import DatatypeIdValue > from org.wikidata.wdtk.datamodel.interfaces import EntityDocument > from org.wikidata.wdtk.datamodel.interfaces import ItemDocument > from org.wikidata.wdtk.datamodel.interfaces import ItemIdValue > from org.wikidata.wdtk.datamodel.interfaces import PropertyDocument > from org.wikidata.wdtk.datamodel.interfaces import PropertyIdValue > from org.wikidata.wdtk.datamodel.interfaces import Reference > from org.wikidata.wdtk.datamodel.interfaces import Statement > from org.wikidata.wdtk.datamodel.interfaces import StatementDocument > from org.wikidata.wdtk.datamodel.interfaces import StatementGroup > from org.wikidata.wdtk.wikibaseapi import ApiConnection > from org.wikidata.wdtk.util import WebResourceFetcherImpl > from org.wikidata.wdtk.wikibaseapi import ApiConnection > from org.wikidata.wdtk.wikibaseapi import LoginFailedException > from org.wikidata.wdtk.wikibaseapi import WikibaseDataEditor > from org.wikidata.wdtk.wikibaseapi import WikibaseDataFetcher > from org.wikidata.wdtk.wikibaseapi.apierrors import > MediaWikiApiErrorException > # print dir(ItemDocument) > # print dir(ApiConnection) > > > dataFetcher = WikibaseDataFetcher(connection, siteIri) > # print dir(dataFetcher) > # itemDocuments = > dataFetcher.getEntityDocumentsByTitle('enwiki',['Kasega Church > of Uganda > Primary School']) > # itemDocuments = dataFetcher.getEntityDocuments('Q22695926') > itemDocuments = > dataFetcher.getEntityDocumentsByTitle('enwiki','Kasega > Church of Uganda Primary School') > # print dir(itemDocuments) > print str(len(itemDocuments)) + ' resulting items' > print itemDocuments.toString() > # for itemDocument in itemDocuments: > # print '==========================' > # print itemDocument.toString() > > >

8 years, 2 months

[Wikidata] Wikidata Toolkit 0.6.0 released

by Markus Krötzsch

Hi all, I am happy to announce the release of Wikidata Toolkit 0.6.0 [1], the Java library for programming with Wikidata and Wikibase. The most prominent new feature of this release is improved support for writing bots (full support for maxlag and edit throttling, simpler code through convenience methods, fixed a previous issue with API access). In addition, the new version introduces support for the new Wikidata property types "external-id" and "math". We have also improved our documentation by creating an example project that shows how to use Wikidata Toolkit as a library in your own, stand-alone Java code [2]. The bot code in the examples is used in actual bots, and was used for thousands of edits on Wikidata (e.g., some may have noticed that the annoying "+-1" after population numbers and the like has become quite rare recently ;-). Maven users can get the library directly from Maven Central (see [1]); this is the preferred method of installation. There is also an all-in-one JAR at github [3] and of course the sources [4] and updated JavaDocs [5]. As usual, feedback is welcome. Developers are also invited to contribute via github. Cheers, Markus [1] https://www.mediawiki.org/wiki/Wikidata_Toolkit [2] https://github.com/Wikidata/Wikidata-Toolkit-Examples [3] https://github.com/Wikidata/Wikidata-Toolkit/releases [4] https://github.com/Wikidata/Wikidata-Toolkit/ [5] http://wikidata.github.io/Wikidata-Toolkit/ -- Markus Kroetzsch Faculty of Computer Science Technische Universität Dresden +49 351 463 38486 http://korrekt.org/

8 years, 2 months

Caching HTML for mobile

by Daniel Kinzler

Hi all! For wikidata, it would be useful if we could serve different HTML for desktop and mobile. In particular, we would like to not send all the stuff that is needed for editing to mobile devices. My understanding is that currently, mobile output uses separate web caches, but shares the parser cache with the desktop version of the page. Is that correct? And would it be feasible to also split the parser cache? Thanks! -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

8 years, 2 months

Splitting out GUI repo for WDQS

by Stas Malyshev

Hi! I've decided split out WDQS GUI to separate repo outside main WDQS repo and probably use it as submodule (for now) in both source and deployment. This is because code there is completely different and it makes little sense to run Java tests when we change Javascript in GUI, for example, or rebuild jar/war modules when new GUI needs to be deployed. I've created a new repository in Gerrit - wikidata/query/gui - and imported all the existing code there. If there are no objections, I'll proceed to restructuring the main repo and deployment to use this repository, and connecting it to CI. I hope this would make working with it less cumbersome. Please tell me if you have any comments/suggestions/objections. -- Stas Malyshev smalyshev(a)wikimedia.org

8 years, 2 months

Technical information about the new "math" and "external-id" data types

by Daniel Kinzler

As Lydia announced, we are going to deploy support for two new data types soon (think of "data types" as "property types", as opposed to "value types"): * The "math" type for formulas. This will use TeX syntax and is provided by the same extension that implements <math> for wikitext. We plan to roll this out on Feb 9th. * The "external-id" type for references to external resources. We plan to roll this out on Feb 16th. NOTE: Many of the existing properties for external identifiers will be converted from the plain "string" data type to the new "external-id" data type, see <https://www.wikidata.org/wiki/User:Addshore/Identifiers>. Both these new types will use the "string" value type. Below are two examples of Snaks that use the new data type, in JSON: { "snaktype": "value", "property": "P717", "datavalue": { "value": "\\sin x^2 + \\cos_b x ^ 2 = e^{2 \\tfrac\\pi{i}}", "type": "string" }, "datatype": "math" } { "snaktype": "value", "property": "P708", "datavalue": { "value": "BADWOLF", "type": "string" }, "datatype": "external-id" } As you can see, the only thing that is new is the value of the "datatype" field. Similarly, in RDF, both new data types use plain string literals for now, as you can see from the turtle snippet below: wd:Q2209 a wikibase:Item ; wdt:P717 "\\sin x^2 + \\cos_b x ^ 2 = e^{2 \\tfrac\\pi{i}}" ; wdt:P708 "BADWOLF" . The datatypes themselves are declared as follows: wd:P708 a wikibase:Property ; wikibase:propertyType wikibase:ExternalId . wd:P717 a wikibase:Property ; wikibase:propertyType wikibase:Math . Accordingly, the URIs of the datatypes (not the types of the literals!) are: <http://wikiba.se/ontology-beta#ExternalId> <http://wikiba.se/ontology-beta#Math> These are, for now, the only changes to the representation of Snaks. We do however consider some additional changes for the future. To avoid confusion, I'll put them below a big separator: ANNOUNCEMENT ABOVE! -------------------------------------------------------------------------------- ROUGH PLANS BELOW! Here are some changes concerning the math and external-id data types that we are considering or planning for the future. * For the Math datatype, we may want to provide a type URI for the RDF string literal that indicates that the format is indeed TeX. Perhaps we could use <http://purl.org/xtypes/Fragment-LaTeX>. * For the ExternalId data type, we would like to use resource URIs for external IDs (in "direct claims"), if possible. This would only work if we know the base URI for the property (provided by a statement on the property definition). For properties with no base URI set, we would still use plain string literals. In our example above, the base URI for P708 might be <https://tardis.net/allonzy/>. The Turtle snippet would read: wd:Q2209 a wikibase:Item ; wdt:P717 "\\sin x^2 + \\cos_b x ^ 2 = e^{2 \\tfrac\\pi{i}}" ^^purl:Fragment-LaTeX; wdt:P708 <https://tardis.net/allonzy/BADWOLF> . However, the full representation of the statement would still use the original string literal: wds:Q2209-24942a17-4791-a49d-6469-54e581eade55 a wikibase:Statement, wikibase:BestRank ; wikibase:rank wikibase:NormalRank ; ps:P708 "BADWOLF" . We would also like to provide the full URI of the external resource in JSON, making us a good citizen of the web of linked data. We plan to do this using a mechanism we call "derived values", which we also plan to use for other kinds of normalization in the JSON output. The idea is to include additional data values in the JSON representation of a Snak: { "snaktype": "value", "property": "P708", "datavalue": { "value": "BADWOLF", "type": "string" }, "datavalue-uri": { "value": "https://tardis.net/allonzy/BADWOLF", "type": "string" }, "datatype": "external-id" } In some cases, such as ISBNs, we would want a URL as well as a URI: { "snaktype": "value", "property": "P708", "datavalue": { "value": "3827370191", "type": "string" }, "datavalue-uri": { "value": "urn:isbn:3827370191", "type": "string" }, "datavalue-url": { "value": "https://www.wikidata.org/wiki/Special:BookSources/3827370191", "type": "string" }, "datatype": "external-id" } The base URL would be given as a statement on the property, just like the base URI. We plan to use the same mechanism for giving Quantities in a standard unit, providing thumbnail URLs for CommonsMedia values, etc. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

8 years, 2 months

On interface stability and forward compatibility

by Daniel Kinzler

Hi all! In the context of introducing the new "math" and "external-id" data types, the question came up whether this introduction constitutes a breaking change to the data model. The answer to this depends on whether you take the "English" or the "German" approach to interpreting the format: According to <https://en.wikipedia.org/wiki/Everything_which_is_not_forbidden_is_allowed>, in England, "everything which is not forbidden is allowed", while, in Germany, the opposite applies, so "everything which is not allowed is forbidden". In my mind, the advantage of formats like JSON, XML and RDF is that they provide good discovery by eyeballing, and that they use a mix-and-match approach. In this context, I favour the English approach: anything not explicitly forbidden in the JSON or RDF is allowed. So I think clients should be written in a forward-compatible way: they should handle unknown constructs or values gracefully. In this vein, I would like to propose a few guiding principles for the design of client libraries that consume Wikibase RDF and particularly JSON output: * When encountering an unknown structure, such as an unexpected key in a JSON encoded object, the consumer SHOULD skip that structure. Depending on context and use case, a warning MAY be issued to alert the user that some part of the data was not processed. * When encountering a malformed structure, such as missing a required key in a JSON encoded object, the consumer MAY skip that structure, but then a warning MUST be issued to alert the user that some part of the data was not processed. If the structure is not skipped, the consumer MUST fail with a fatal error. * Clients MUST make a clear distinction of data types and values types: A Snak's data type determines the interpretation of the value, while the type of the Snak's data value specifies the structure of the value representation. * Clients SHOULD be able to process a Snak about a Property of unknown data type, as long as the value type is known. In such a case, the client SHOULD fall back to the behaviour defined for the value type. If this is not possible, the Snak MUST be skipped and a warning SHOULD be issued to alert the user that some part of the data could not be interpreted. * When encountering an unknown type of data value (value type), the client MUST either ignore the respective Snak, or fail with a fatal error. A warning SHOULD be issued to alert the user that some part of the data could not be processed. Do you think these guidelines are reasonable? It seems to me that adopting them should save everyone some trouble. -- Daniel Kinzler Senior Software Developer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

8 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech February 2016