Thank you for the explanation. The content negotion for an Item IRI is
clear. Any request for
http://www.wikidata.org/entity/Q... requires an
Accept application/rdf+xml header in order to get the RDF. The default
response is JSON and Accept text/html returns a 200 response delivering the
UI page.
For statement resolution in the Item RDF, is not this a fragment? So in
the Item context, the IRI for a statement resource would be
http://www.wikidata.org/entity/Q16521#Statement_UUID. Otherwise, the
statement IRI
http://www.wikidata.org/entity/statement/Statement_UUID could
just return the statement as a separate entity.
On the topic of references, a use case is to measure data quality by
counting the number of "unreferenced statements". At
https://phabricator.wikimedia.org/T117234#1834728, I propose the
possibility of using blank reference nodes to identify these "bad"
statements. Having an object to count greatly expedites the query process
because of the estimated cardinality feature of Blazegraph. The only
alternative to this is to count distinct statements with the
prov:wasDerivedFrom predicate, and this is extremely slow (in fact, it may
not be possible without a huge amount of memory).
I do not know what would be involved in implementing blank reference nodes
and what performance consequences may also occur. It seems to me that the
pairing of statements and references is a core feature of the data model,
and it is odd that there can exist statements that have no associated
reference node in the RDF.
Cheers,
Christopher
On 27 November 2015 at 13:00, <wikidata-tech-request(a)lists.wikimedia.org>
wrote:
Send Wikidata-tech mailing list submissions to
wikidata-tech(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
or, via email, send a message with subject or body 'help' to
wikidata-tech-request(a)lists.wikimedia.org
You can reach the person managing the list at
wikidata-tech-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wikidata-tech digest..."
Today's Topics:
1. RDF Item, Statement and Reference IRI Resolution?
(Christopher Johnson)
2. Re: RDF Item, Statement and Reference IRI Resolution?
(Markus Krötzsch)
----------------------------------------------------------------------
Message: 1
Date: Fri, 27 Nov 2015 07:21:10 +0100
From: Christopher Johnson <christopher.johnson(a)wikimedia.de>
To: wikidata-tech(a)lists.wikimedia.org, wikimedia-de-tech
<wikimedia-de-tech(a)wikimedia.de>
Subject: [Wikidata-tech] RDF Item, Statement and Reference IRI
Resolution?
Message-ID:
<CACzuuKvGK1dM1+dn4ypocjhO=
psuk4LLtWngZp1yFVP6wmVqFA(a)mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
After looking at the RDF format closely, I am asking if the item, statement
and reference IRIs could/should be directly resolvable to XML/JSON
formatted resources.
It seems that currently
http://www.wikidata.org/entity/.... redirects to
the UI at
https://www.wikidata.org/wiki/ which is not what a machine
reader
would expect.
Without a simple method to resolve the IRIs (perhaps a RESTful API?), these
RDF data objects are opaque for parsers.
Of course, with wbgetclaims, it is possible to get the statement like this:
https://www.wikidata.org/w/api.php?action=wbgetclaims&format=xml&cl…
but the API expected GUID format does not match the RDF UUID representation
(there is a $ or "%24" after the item instead of a -) and it returns both
the statement and the references.
Since the reference is its own node in the RDF, it can be queried
independently. For example, to ask "return all of the statements where
reference R is bound." But then, the return value is a list of statement
IDs and a subquery or separate query is then required to return the
associated statement node.
I am also wondering why item, statement and reference "UUIDs" are not in
canonical format in the RDF. This is a question of compliance with IETF
guidelines, which may or may not be relevant.
Item: Q20913766
Statement: Q20913766-CD281698-E1D0-43A1-BEEA-E2A60E5A88F1
Reference: 39f3ce979f9d84a0ebf09abe1702bf22326695e9
See:
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
See:
http://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml
and
http://tools.ietf.org/html/rfc4122 for information on urn:uuid
guidelines.
Thanks for your feedback,
Christopher