Hi Stas, Markus, Denny!
For a long time now, we have been wanting to generate proper resource references (URIs) for external identifier values, see https://phabricator.wikimedia.org/T121274.
Implementing this is complicated by the fact that "expanded" identifiers may occur in four different places in the data model (direct, statement, qualifier, reference), and that we can't simply replace the old string value, we need to provide an additional value.
I have attached three files with snippets of three different RDF mappings: - Q111.ttl - the status quo, with normalized predicates declared but not used. - Q111.rc.ttl - modeling resource predicates separately from normalized values. - Q111.norm.ttl - modeling resource predicates as normalized values.
The "rc" variant means more overhead, the "norm" variant may have semantic difficulties. Please look at the two options for the new mapping and let me know which you like best. You can use a plain old diff between the files for a first impression.
By the way, I'm also re-considering my original approach:
Simply replace the plain value with the resolved URI when we can. This would *not* cause the same property to be used with literals and non-literals, since the predicate name is derived from the proeprty ID, and a property either provides a URI mapping, or it doesn't.
Problems would arise during transition, making this a breaking change:
1) when introducing this feature, existing queries that compare a newly URI-ified property to a string literal will fail.
2) when a URI mapping is added, we'd either need to immediately update all statements that use that property, or the triple store would have some old triples where the relevant predicates point to a literal, and some new triples where it pints to a resource.
This would avoid duplicating more predicates, and keeps the model straight forward. But it would cause a bumpy transition.
Please let me know which approach you prefer. Have a look at the files attached to my original message.
Thanks, Daniel
Am 09.11.2016 um 17:46 schrieb Daniel Kinzler:
Hi Stas, Markus, Denny!
For a long time now, we have been wanting to generate proper resource references (URIs) for external identifier values, see https://phabricator.wikimedia.org/T121274.
Implementing this is complicated by the fact that "expanded" identifiers may occur in four different places in the data model (direct, statement, qualifier, reference), and that we can't simply replace the old string value, we need to provide an additional value.
I have attached three files with snippets of three different RDF mappings:
- Q111.ttl - the status quo, with normalized predicates declared but not used.
- Q111.rc.ttl - modeling resource predicates separately from normalized values.
- Q111.norm.ttl - modeling resource predicates as normalized values.
The "rc" variant means more overhead, the "norm" variant may have semantic difficulties. Please look at the two options for the new mapping and let me know which you like best. You can use a plain old diff between the files for a first impression.
Daniel,
How would your original approach affect properties like "exact match" and similar ? For example, https://www.wikidata.org/wiki/Q516521
-Thad +ThadGuidry https://www.google.com/+ThadGuidry
Am 14.11.2016 um 18:51 schrieb Thad Guidry:
How would your original approach affect properties like "exact match" and similar ? For example, https://www.wikidata.org/wiki/Q516521
This will not affect "exact match", since "exact match" is defined to be a URL, not an "external identifier". No expansion is needed to represent it as a resource in RFC, it is already represented as a resource.
I'm not sure which properties you consider similar to "eaxt match", but my mail only related to properties that have the type "external identifier".
wikidata-tech@lists.wikimedia.org