Hi Daniel,
if I understand you correctly, you are in favour of equating datavalue
types and property types. This would solve indeed the problems at hand.
The reason why both kinds of types are distinct in SMW and also in
Wikidata is that property types are naturally more extensible than
datavalue types. CommonsMedia is a good example of this: all you need is
a custom UI and you can handle "new" data without changing the
underlying data model. This makes it easy for contributors to add new
types without far-reaching ramifications in the backend (think of
numbers, which could be decimal, natural, positive, range-restricted,
etc. but would still be treated as a "number" in the backend).
Using fewer datavalue types also improves interoperability. E.g., you
want to compare two numbers, even if one is a natural number and another
one is a decimal.
There is no simple rule for deciding how many datavalue types there
should be. The general guideline is to decide on datavalue types based
on use cases. I am arguing for diversifying IRIs and strings since there
are many contexts and applications where this is a crucial difference.
Conversely, I don't know of any application where it makes sense to keep
the two similar (this would have to be something where we compare
strings and IRIs on a data level, e.g., if you were looking for all
websites with URLs that are alphabetically greater than the postcode of
a city in England :-p).
In general, however, it will be good to keep the set of basic datavalue
types small, while allowing the set of property types to grow. The set
of base datavalue types that we use is based on the experience in SMW as
well as on existing formats like XSD (which also has many derived types
but only a few base types).
As for the possible confusion, I think some naming discipline would
clarify this. In SMW, there is a stronger difference between both kinds
of types, and a fixed schema for property type ids that makes it easy to
recognise them.
In any case, using string for IRIs does not seem to solve any problem.
It does not simplify the type system in general and it does not help
with the use cases that I mentioned. What I do not agree with are your
arguments about all of this being "internal". We would not have this
discussion if it were. The data model of Wikidata is the primary
conceptual model that specifies what Wikidata stores. You might still be
right that some of the implementation is internal, but the arguments we
both exchange are not really on the implementation level ;-).
Best wishes
Markus, offline soon for travelling
On 26/08/13 10:35, Daniel Kinzler wrote:
Am 25.08.2013 19:19, schrieb Markus Krötzsch:
If we have an IRI DV, considering that URLs are
special IRIs, it seems
clear
that IRI would be the best way of storing them.
The best way of storing them really depends on the storage platform. It
may be a string or something else.
I think the real issue here is that we are exposing something that is
really an internal detail (the data value type) instead of the high
level information we actually should be exposing, namely property type.
I think splitting the two was a mistake, and I think exposing the DV
type while making the property type all but inaccessible makes things a
lot worse.
In my opinion, data should be self-descriptive, so the *semantic* type
of the property should be included along with the value. People expect
this, and assume that this is what the DV type is. But it's not, and
should not be used or abused for this purpose.
Ideally, it should not matter at all to any 3rd party if use use a
string or IRI DV internally. The (semantic) property type would be URL,
and that's all that matters.
I'm quite unhappy about the current situation; we are beginning to see
the backlash of the decision not to include the property type inline. If
we don't do anything about this now, I fear the confusion is going to
get worse.
-- daniel