On 01/09/2015 09:26, Markus Krötzsch
wrote:
On 01.09.2015 05:17, Stas Malyshev wrote:
Hi!
I would have thought that the correct
approach would be to encode these
values as gYear, and just record the four-digit year.
While we do have a ticket for that
(https://phabricator.wikimedia.org/T92009) it's not that simple
since
many triple stores consider dateTime and gYear to be completely
different types and as such some queries between them would not
work.
I agree. Our original RDF exports in Wikidata Toolkit are still
using gYear, but I am not sure that this is a practical approach.
In particular, this does not solve the encoding of time precisions
in RDF. It only introduces some special cases for year (and also
for month and day), but it cannot be used to encode decades,
centuries, etc.
My current view is that it would be better to encode the actual
time point with maximal precision, and to keep the Wikidata
precision information independently. This applies to the full
encoding of time values (where you have a way to give the
precision as a separate value).
For the simple encoding, where the task is to encode a Wikidata
time in a single RDF literal, things like gYear would make sense.
At least full precision times (with time of day!) would be rather
misleading there.
In any case, when using full precision times for cases with
limited precision, it would be good to create a time point for RDF
based on a uniform rule. Easiest option that requires no calendar
support: use the earliest second that is within the given
interval. So "20th century" would always lead to the time point
"1900-01-01T00:00:00". If this is not done, it will be very hard
to query for all uses of "20th century" in the data.
This is an issue which the cultural heritage community has been
dealing with for decades ( :-) ).
In short, a single date is never going to do an adequate job of
representing (a) a period over which an event happened and (b)
uncertainty over the start and/or end point in this period. These
periods will almost never neatly fit into years, decades, centuries,
etc.: these are just a convenience for grouping approximations
together. Representing e.g. '3.1783 - 12.1820' as either decades or
centuries is going to give a very misleading version of what you
actually know about the period (and you still can't reduce it to a
single 'date thing').
I think that you need at least two dates to represent historical
event dating with any sort of honesty and flexibility. What those
dates should be is a matter for discussion: the CIDOC CRM for
example has the concept of "ongoing throughout" and "at some time
within", which are respectively the minimal and maximal periods
associated with an event. Common museum practice in the U.K. is to
record 'start date' and 'end date', each with a possible
qualification as regards its precision.
Richard
Markus
_______________________________________________
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
--
Richard Light