Hi,
I found that Special:EntityData returns outdated JSON data that is not
in agreement with the page. I have fetched the data using wget to ensure
that no browser cache is in the way. Concretely, I have been looking at
https://www.wikidata.org/wiki/Special:EntityData/Q17444909.json
where I recently changed the P279 value from Q217594 to Q16889133. Of
course, this might no longer be a valid example when you read this email
(in case the cache gets updated at some point).
Is this a bug in the configuration of the HTTP (or other) cache, or is
this the desired behaviour? When will the cache be cleared?
Thanks,
Markus
Some Jenkins jobs now fail for all changes to Wikibase. E.g.
<https://gerrit.wikimedia.org/r/#/c/270008/> and
<https://gerrit.wikimedia.org/r/#/c/270572/>. Errors I see:
11:28:52 PHP Strict standards: Declaration of
Capiunto\Test\BasicRowTest::testLua() should be compatible with
Scribunto_LuaEngineTestBase::testLua($key, $testName, $expected) in
/mnt/jenkins-workspace/workspace/mwext-testextension-php55-composer/src/extensions/Capiunto/tests/phpunit/output/BasicRowTest.php
on line 51
11:39:14 1) LuaSandbox:
Wikibase\Client\Tests\DataAccess\Scribunto\Scribunto_LuaWikibaseEntityLibraryTest::testRegister
11:39:14 Failed asserting that LuaSandboxFunction Object () is an instance of
class "Scribunto_LuaStandaloneInterpreterFunction".
I guess some change to Scribunto broke compatibility...
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Good morning,
Stas Malyshev & Magnus Manske ,
Thanks all for your prompt response and for sending me the related
site resources. I have been reviewing the references sent and am very
content
with this path seeking experience. I am looking forward to the Geo support
that
will be added with the next Blazegraph update, AFAIK. Although, I know not
much
about how this update will facilitate for the retrieval of Geo attributes (
e.g., bounding coordinates, shapefile data, Woe id etc .)
Either way, very interesting!,
Great Thanks,
Jorge Hernandez
On Sat, Feb 20, 2016 at 4:00 AM, <wikidata-tech-request(a)lists.wikimedia.org>
wrote:
> Send Wikidata-tech mailing list submissions to
> wikidata-tech(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
> or, via email, send a message with subject or body 'help' to
> wikidata-tech-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> wikidata-tech-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikidata-tech digest..."
>
>
> Today's Topics:
>
> 1. Wikidata SPRQL (Jorge A. Hernandez) (Jorge Hernandez)
> 2. Re: Wikidata SPRQL (Jorge A. Hernandez) (Magnus Manske)
> 3. Re: Wikidata SPRQL (Jorge A. Hernandez) (Stas Malyshev)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 19 Feb 2016 11:43:52 -0800
> From: Jorge Hernandez <jorgeah2(a)uci.edu>
> To: wikidata-tech(a)lists.wikimedia.org
> Subject: [Wikidata-tech] Wikidata SPRQL (Jorge A. Hernandez)
> Message-ID:
> <
> CAKRCeoqcti2aGOrUfxdEy7qbO1XzXEWc+xaJHcbcia7PpSVn3Q(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Greeting Wiki Tinkers!
>
> I have been researching into Wikidata SPRQL, and am very much enjoying
> the Wikidata Query Service API <https://query.wikidata.org/>, tool. After
> some related research, I have managed to extract many related geo
> attributes
> of request records like ( e.g., cities in cali, parking in U.S. etc.)
> alongside pertinent fields
> like wikipedia site, GNSID, etc.
>
> Although, one problem that I am having occurs when I requesting "bounding
> boxes"
> (i.e., northeast, southwest) point coordinates per record feature.
> In fact, I am able to get the pertinent point coordinates (lat and long),
> but how do we get the bounding boxes?
>
> Is there a web location site where related queries are up for share?
> Is there a Wikidata SPRQL prominent themed archive somewhere?
>
> Thanks All!
>
> Kindest Regards,
> J.A. H.
>
Greeting Wiki Tinkers!
I have been researching into Wikidata SPRQL, and am very much enjoying
the Wikidata Query Service API <https://query.wikidata.org/>, tool. After
some related research, I have managed to extract many related geo attributes
of request records like ( e.g., cities in cali, parking in U.S. etc.)
alongside pertinent fields
like wikipedia site, GNSID, etc.
Although, one problem that I am having occurs when I requesting "bounding
boxes"
(i.e., northeast, southwest) point coordinates per record feature.
In fact, I am able to get the pertinent point coordinates (lat and long),
but how do we get the bounding boxes?
Is there a web location site where related queries are up for share?
Is there a Wikidata SPRQL prominent themed archive somewhere?
Thanks All!
Kindest Regards,
J.A. H.
[Moving to wikidata-tech; previous conversation inline below]
Hi Polyglot,
ah, now I see. The Wikidata Toolkit method you call is looking for items
by Wikipedia page title, not for items by label. Labels and titles are
not related in Wikidata. The search by title is supported by the
wbgetentities API action for which we have a wrapper class, but this API
action does not support the search by label.
In fact, I am not sure that there is any API action for doing what you
want. There is only wbsearchentities, but this search will return near
matches and also look for aliases. Maybe this is not a big issue for
long strings as in your case, but for shorter strings you would get many
results and you would still need to check if they really match.
Anyway, you are right that it would be nice if we would implement
support for the label/alias search as well. For this, we need to make a
wrapper class for wbsearchentities. I created an issue to track this:
https://github.com/Wikidata/Wikidata-Toolkit/issues/228
Cheers,
Markus
On 13.02.2016 23:22, Jo wrote:
> Hi Markus,
>
> I'm searching for a wikidata item with that label. It would be even
> better if it were possible to search for a label/description combination.
>
> This is the item I'm looking for:
> https://www.wikidata.org/wiki/Q22695926
>
> I mostly want to make sure that I'm not creating duplicate entries in
> Wikidata, most of those schools are not noteworthy enough to get an
> article on Wikipedia, but since they have objects in Openstreetmap, I
> would think they are interesting enough for Wikidata.
>
> Polyglot
>
> 2016-02-13 23:13 GMT+01:00 Markus Krötzsch
> <markus(a)semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>:
>
> Hi Jo,
>
> You are searching for an item that is assigned to the article
> "Kasega Church of Uganda Primary School" on English Wikipedia.
> However, there is not article of this name on English Wikipedia.
> Maybe there is a typo? Can you tell me which Wikidata item should be
> returned here?
>
> Cheers,
>
> Markus
>
> P.S. If you agree, I would prefer to continue this discussion on
> wikidata-tech for the benefit of others who may have similar questions.
>
>
>
> On 13.02.2016 14:47, Jo wrote:
>
> Hi Marcus,
>
> I had started to write my own implementation of a Wikidata bot in
> Jython, so I could use it in JOSM, but still get to code in
> Python. This
> worked well for a while, but now apparently something was
> changed to the
> login API.
>
> Anyway, I can't code in all possible things that can go wrong, so it
> makes more sense to reuse an existing framework.
>
> What I want to do is add items, but I want to check if they already
> exist first. Try as I may, I can't seem to retrieve the items I
> create
> myself, like:
>
>
> Kasega Church of Uganda Primary School
>
> Douglas Adams, on the other hand doesn't pose a problem.
>
>
> I can't figure out why this is. Some things can be found, others
> can't.
> I tried with a few more entries from recent changes.
>
>
> In my own bot, I had more succes with searchEntities than with
> getEntities. Was this implemented in WDTK?
>
> I hope you can help, I'm stuck, as it doesn't make a lot of sense to
> continue with the conversion, if I can't even get a trivial
> thing like
> this to work.
>
> from org.wikidata.wdtk.datamodel.helpers import Datamodel
> from org.wikidata.wdtk.datamodel.helpers import ItemDocumentBuilder
> from org.wikidata.wdtk.datamodel.helpers import ReferenceBuilder
> from org.wikidata.wdtk.datamodel.helpers import StatementBuilder
> from org.wikidata.wdtk.datamodel.interfaces import DatatypeIdValue
> from org.wikidata.wdtk.datamodel.interfaces import EntityDocument
> from org.wikidata.wdtk.datamodel.interfaces import ItemDocument
> from org.wikidata.wdtk.datamodel.interfaces import ItemIdValue
> from org.wikidata.wdtk.datamodel.interfaces import PropertyDocument
> from org.wikidata.wdtk.datamodel.interfaces import PropertyIdValue
> from org.wikidata.wdtk.datamodel.interfaces import Reference
> from org.wikidata.wdtk.datamodel.interfaces import Statement
> from org.wikidata.wdtk.datamodel.interfaces import StatementDocument
> from org.wikidata.wdtk.datamodel.interfaces import StatementGroup
> from org.wikidata.wdtk.wikibaseapi import ApiConnection
> from org.wikidata.wdtk.util import WebResourceFetcherImpl
> from org.wikidata.wdtk.wikibaseapi import ApiConnection
> from org.wikidata.wdtk.wikibaseapi import LoginFailedException
> from org.wikidata.wdtk.wikibaseapi import WikibaseDataEditor
> from org.wikidata.wdtk.wikibaseapi import WikibaseDataFetcher
> from org.wikidata.wdtk.wikibaseapi.apierrors import
> MediaWikiApiErrorException
> # print dir(ItemDocument)
> # print dir(ApiConnection)
>
>
> dataFetcher = WikibaseDataFetcher(connection, siteIri)
> # print dir(dataFetcher)
> # itemDocuments =
> dataFetcher.getEntityDocumentsByTitle('enwiki',['Kasega Church
> of Uganda
> Primary School'])
> # itemDocuments = dataFetcher.getEntityDocuments('Q22695926')
> itemDocuments =
> dataFetcher.getEntityDocumentsByTitle('enwiki','Kasega
> Church of Uganda Primary School')
> # print dir(itemDocuments)
> print str(len(itemDocuments)) + ' resulting items'
> print itemDocuments.toString()
> # for itemDocument in itemDocuments:
> # print '=========================='
> # print itemDocument.toString()
>
>
>
Hi all,
I am happy to announce the release of Wikidata Toolkit 0.6.0 [1], the
Java library for programming with Wikidata and Wikibase.
The most prominent new feature of this release is improved support for
writing bots (full support for maxlag and edit throttling, simpler code
through convenience methods, fixed a previous issue with API access). In
addition, the new version introduces support for the new Wikidata
property types "external-id" and "math".
We have also improved our documentation by creating an example project
that shows how to use Wikidata Toolkit as a library in your own,
stand-alone Java code [2].
The bot code in the examples is used in actual bots, and was used for
thousands of edits on Wikidata (e.g., some may have noticed that the
annoying "+-1" after population numbers and the like has become quite
rare recently ;-).
Maven users can get the library directly from Maven Central (see [1]);
this is the preferred method of installation. There is also an
all-in-one JAR at github [3] and of course the sources [4] and updated
JavaDocs [5].
As usual, feedback is welcome. Developers are also invited to contribute
via github.
Cheers,
Markus
[1] https://www.mediawiki.org/wiki/Wikidata_Toolkit
[2] https://github.com/Wikidata/Wikidata-Toolkit-Examples
[3] https://github.com/Wikidata/Wikidata-Toolkit/releases
[4] https://github.com/Wikidata/Wikidata-Toolkit/
[5] http://wikidata.github.io/Wikidata-Toolkit/
--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/
Hi all!
For wikidata, it would be useful if we could serve different HTML for desktop
and mobile. In particular, we would like to not send all the stuff that is
needed for editing to mobile devices.
My understanding is that currently, mobile output uses separate web caches, but
shares the parser cache with the desktop version of the page. Is that correct?
And would it be feasible to also split the parser cache?
Thanks!
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Hi!
I've decided split out WDQS GUI to separate repo outside main WDQS repo
and probably use it as submodule (for now) in both source and
deployment. This is because code there is completely different and it
makes little sense to run Java tests when we change Javascript in GUI,
for example, or rebuild jar/war modules when new GUI needs to be deployed.
I've created a new repository in Gerrit - wikidata/query/gui - and
imported all the existing code there. If there are no objections, I'll
proceed to restructuring the main repo and deployment to use this
repository, and connecting it to CI. I hope this would make working with
it less cumbersome.
Please tell me if you have any comments/suggestions/objections.
--
Stas Malyshev
smalyshev(a)wikimedia.org
As Lydia announced, we are going to deploy support for two new data types soon
(think of "data types" as "property types", as opposed to "value types"):
* The "math" type for formulas. This will use TeX syntax and is provided by the
same extension that implements <math> for wikitext. We plan to roll this out on
Feb 9th.
* The "external-id" type for references to external resources. We plan to roll
this out on Feb 16th. NOTE: Many of the existing properties for external
identifiers will be converted from the plain "string" data type to the new
"external-id" data type, see
<https://www.wikidata.org/wiki/User:Addshore/Identifiers>.
Both these new types will use the "string" value type. Below are two examples of
Snaks that use the new data type, in JSON:
{
"snaktype": "value",
"property": "P717",
"datavalue": {
"value": "\\sin x^2 + \\cos_b x ^ 2 = e^{2 \\tfrac\\pi{i}}",
"type": "string"
},
"datatype": "math"
}
{
"snaktype": "value",
"property": "P708",
"datavalue": {
"value": "BADWOLF",
"type": "string"
},
"datatype": "external-id"
}
As you can see, the only thing that is new is the value of the "datatype" field.
Similarly, in RDF, both new data types use plain string literals for now, as you
can see from the turtle snippet below:
wd:Q2209 a wikibase:Item ;
wdt:P717 "\\sin x^2 + \\cos_b x ^ 2 = e^{2 \\tfrac\\pi{i}}" ;
wdt:P708 "BADWOLF" .
The datatypes themselves are declared as follows:
wd:P708 a wikibase:Property ;
wikibase:propertyType wikibase:ExternalId .
wd:P717 a wikibase:Property ;
wikibase:propertyType wikibase:Math .
Accordingly, the URIs of the datatypes (not the types of the literals!) are:
<http://wikiba.se/ontology-beta#ExternalId>
<http://wikiba.se/ontology-beta#Math>
These are, for now, the only changes to the representation of Snaks. We do
however consider some additional changes for the future. To avoid confusion,
I'll put them below a big separator:
ANNOUNCEMENT ABOVE!
--------------------------------------------------------------------------------
ROUGH PLANS BELOW!
Here are some changes concerning the math and external-id data types that we are
considering or planning for the future.
* For the Math datatype, we may want to provide a type URI for the RDF string
literal that indicates that the format is indeed TeX.
Perhaps we could use <http://purl.org/xtypes/Fragment-LaTeX>.
* For the ExternalId data type, we would like to use resource URIs for external
IDs (in "direct claims"), if possible. This would only work if we know the base
URI for the property (provided by a statement on the property definition). For
properties with no base URI set, we would still use plain string literals.
In our example above, the base URI for P708 might be
<https://tardis.net/allonzy/>. The Turtle snippet would read:
wd:Q2209 a wikibase:Item ;
wdt:P717 "\\sin x^2 + \\cos_b x ^ 2 = e^{2 \\tfrac\\pi{i}}"
^^purl:Fragment-LaTeX;
wdt:P708 <https://tardis.net/allonzy/BADWOLF> .
However, the full representation of the statement would still use the original
string literal:
wds:Q2209-24942a17-4791-a49d-6469-54e581eade55 a wikibase:Statement,
wikibase:BestRank ;
wikibase:rank wikibase:NormalRank ;
ps:P708 "BADWOLF" .
We would also like to provide the full URI of the external resource in JSON,
making us a good citizen of the web of linked data. We plan to do this using a
mechanism we call "derived values", which we also plan to use for other kinds of
normalization in the JSON output. The idea is to include additional data values
in the JSON representation of a Snak:
{
"snaktype": "value",
"property": "P708",
"datavalue": {
"value": "BADWOLF",
"type": "string"
},
"datavalue-uri": {
"value": "https://tardis.net/allonzy/BADWOLF",
"type": "string"
},
"datatype": "external-id"
}
In some cases, such as ISBNs, we would want a URL as well as a URI:
{
"snaktype": "value",
"property": "P708",
"datavalue": {
"value": "3827370191",
"type": "string"
},
"datavalue-uri": {
"value": "urn:isbn:3827370191",
"type": "string"
},
"datavalue-url": {
"value": "https://www.wikidata.org/wiki/Special:BookSources/3827370191",
"type": "string"
},
"datatype": "external-id"
}
The base URL would be given as a statement on the property, just like the base URI.
We plan to use the same mechanism for giving Quantities in a standard unit,
providing thumbnail URLs for CommonsMedia values, etc.
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Hi all!
In the context of introducing the new "math" and "external-id" data types, the
question came up whether this introduction constitutes a breaking change to the
data model. The answer to this depends on whether you take the "English" or the
"German" approach to interpreting the format: According to
<https://en.wikipedia.org/wiki/Everything_which_is_not_forbidden_is_allowed>, in
England, "everything which is not forbidden is allowed", while, in Germany, the
opposite applies, so "everything which is not allowed is forbidden".
In my mind, the advantage of formats like JSON, XML and RDF is that they provide
good discovery by eyeballing, and that they use a mix-and-match approach. In
this context, I favour the English approach: anything not explicitly forbidden
in the JSON or RDF is allowed.
So I think clients should be written in a forward-compatible way: they should
handle unknown constructs or values gracefully.
In this vein, I would like to propose a few guiding principles for the design of
client libraries that consume Wikibase RDF and particularly JSON output:
* When encountering an unknown structure, such as an unexpected key in a JSON
encoded object, the consumer SHOULD skip that structure. Depending on context
and use case, a warning MAY be issued to alert the user that some part of the
data was not processed.
* When encountering a malformed structure, such as missing a required key in a
JSON encoded object, the consumer MAY skip that structure, but then a warning
MUST be issued to alert the user that some part of the data was not processed.
If the structure is not skipped, the consumer MUST fail with a fatal error.
* Clients MUST make a clear distinction of data types and values types: A Snak's
data type determines the interpretation of the value, while the type of the
Snak's data value specifies the structure of the value representation.
* Clients SHOULD be able to process a Snak about a Property of unknown data
type, as long as the value type is known. In such a case, the client SHOULD fall
back to the behaviour defined for the value type. If this is not possible, the
Snak MUST be skipped and a warning SHOULD be issued to alert the user that some
part of the data could not be interpreted.
* When encountering an unknown type of data value (value type), the client MUST
either ignore the respective Snak, or fail with a fatal error. A warning SHOULD
be issued to alert the user that some part of the data could not be processed.
Do you think these guidelines are reasonable? It seems to me that adopting them
should save everyone some trouble.
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.