Send Wikidata-tech mailing list submissions to
wikidata-tech@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
or, via email, send a message with subject or body 'help' to
wikidata-tech-request@lists.wikimedia.org
You can reach the person managing the list at
wikidata-tech-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wikidata-tech digest..."
Today's Topics:
1. RDF Item, Statement and Reference IRI Resolution?
(Christopher Johnson)
2. Re: RDF Item, Statement and Reference IRI Resolution?
(Markus Krötzsch)
----------------------------------------------------------------------
Message: 1
Date: Fri, 27 Nov 2015 07:21:10 +0100
From: Christopher Johnson <christopher.johnson@wikimedia.de>
To: wikidata-tech@lists.wikimedia.org, wikimedia-de-tech
<wikimedia-de-tech@wikimedia.de>
Subject: [Wikidata-tech] RDF Item, Statement and Reference IRI
Resolution?
Message-ID:
<CACzuuKvGK1dM1+dn4ypocjhO=psuk4LLtWngZp1yFVP6wmVqFA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
After looking at the RDF format closely, I am asking if the item, statement
and reference IRIs could/should be directly resolvable to XML/JSON
formatted resources.
It seems that currently http://www.wikidata.org/entity/.... redirects to
the UI at https://www.wikidata.org/wiki/ which is not what a machine reader
would expect.
Without a simple method to resolve the IRIs (perhaps a RESTful API?), these
RDF data objects are opaque for parsers.
Of course, with wbgetclaims, it is possible to get the statement like this:
https://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&claim=Q20913766%24CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
but the API expected GUID format does not match the RDF UUID representation
(there is a $ or "%24" after the item instead of a -) and it returns both
the statement and the references.
Since the reference is its own node in the RDF, it can be queried
independently. For example, to ask "return all of the statements where
reference R is bound." But then, the return value is a list of statement
IDs and a subquery or separate query is then required to return the
associated statement node.
I am also wondering why item, statement and reference "UUIDs" are not in
canonical format in the RDF. This is a question of compliance with IETF
guidelines, which may or may not be relevant.
Item: Q20913766
Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9
See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
See: http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml
and http://tools.ietf.org/html/rfc4122 for information on urn:uuid
guidelines.
Thanks for your feedback,
Christopher
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wikidata-tech/attachments/20151127/488f3d30/attachment-0001.html>
------------------------------
Message: 2
Date: Fri, 27 Nov 2015 10:21:22 +0100
From: Markus Krötzsch <markus@semantic-mediawiki.org>
To: Wikidata technical discussion <wikidata-tech@lists.wikimedia.org>,
wikimedia-de-tech <wikimedia-de-tech@wikimedia.de>
Subject: Re: [Wikidata-tech] RDF Item, Statement and Reference IRI
Resolution?
Message-ID: <56582092.3050708@semantic-mediawiki.org>
Content-Type: text/plain; charset=utf-8; format=flowed
On 27.11.2015 07:21, Christopher Johnson wrote:
> Hi,
>
> After looking at the RDF format closely, I am asking if the item,
> statement and reference IRIs could/should be directly resolvable to
> XML/JSON formatted resources.
>
> It seems that currently http://www.wikidata.org/entity/.... redirects to
> the UI at https://www.wikidata.org/wiki/ which is not what a machine
> reader would expect.
This interface actually supports content negotiation. If you open it in
a browser, it redirects to HTML, but an RDF client can request RDF and
will get this. There is no RDF/JSON export AFAIK (maybe it was a typo
above?).
It may also be that auxiliary nodes (such as statements and references)
do not resolve, but resolving the items will always return enough RDF
context to get all data. Resolving statements would be easy by mapping
them to the item data (returning more data is always ok in RDF). This is
possible since the statement IDs are prefixed by the item id. For
references, it might be harder to implement this, since you cannot
reverse the hash to find the item. This might remain open for a while,
since it is more implementation effort.
> Without a simple method to resolve the IRIs (perhaps a RESTful API?),
> these RDF data objects are opaque for parsers.
>
> Of course, with wbgetclaims, it is possible to get the statement like this:
> https://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&claim=Q20913766%24CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
>
> but the API expected GUID format does not match the RDF UUID
> representation (there is a $ or "%24" after the item instead of a -) and
> it returns both the statement and the references.
Yes, using the MediaWiki API will not be a suitable alternative to
getting linked RDF. Let's not go into this.
>
> Since the reference is its own node in the RDF, it can be queried
> independently. For example, to ask "return all of the statements where
> reference R is bound." But then, the return value is a list of
> statement IDs and a subquery or separate query is then required to
> return the associated statement node.
Yes, resolving statement ids has some utility. I hope it works already.
Otherwise it can be made to work without too much effort.
As a temporary workaround for all of this, note that the SPARQL endpoint
can be (ab)used as a linked data source to fetch data for any IRI
present in the data.
>
> I am also wondering why item, statement and reference "UUIDs" are not in
> canonical format in the RDF. This is a question of compliance with IETF
> guidelines, which may or may not be relevant.
>
> Item: Q20913766
> Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
> Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9
>
> See: https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
> See: http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml
> and http://tools.ietf.org/html/rfc4122 for information on urn:uuid
> guidelines.
The IDs used in RDF are simply the ids used in the database. The RDF
export is not aware of the concept of UUID that was an inspiration (but
apparently not an exact model) for the way in which the database is
generating its ids. If Wikibase internally switches to canonical UUIDs,
this will directly show in the RDF.
Best regards,
Markus
>
> Thanks for your feedback,
> Christopher
>
>
> _______________________________________________
> Wikidata-tech mailing list
> Wikidata-tech@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
>
------------------------------
Subject: Digest Footer
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
------------------------------
End of Wikidata-tech Digest, Vol 31, Issue 5
********************************************