Re: [Wikidata-l] claims Datatypes inconsistency suspicion

26 Aug 2013

Hi Daniel,

if I understand you correctly, you are in favour of equating datavalue 
types and property types. This would solve indeed the problems at hand.

The reason why both kinds of types are distinct in SMW and also in 
Wikidata is that property types are naturally more extensible than 
datavalue types. CommonsMedia is a good example of this: all you need is 
a custom UI and you can handle "new" data without changing the 
underlying data model. This makes it easy for contributors to add new 
types without far-reaching ramifications in the backend (think of 
numbers, which could be decimal, natural, positive, range-restricted, 
etc. but would still be treated as a "number" in the backend).

Using fewer datavalue types also improves interoperability. E.g., you 
want to compare two numbers, even if one is a natural number and another 
one is a decimal.

There is no simple rule for deciding how many datavalue types there 
should be. The general guideline is to decide on datavalue types based 
on use cases. I am arguing for diversifying IRIs and strings since there 
are many contexts and applications where this is a crucial difference. 
Conversely, I don't know of any application where it makes sense to keep 
the two similar (this would have to be something where we compare 
strings and IRIs on a data level, e.g., if you were looking for all 
websites with URLs that are alphabetically greater than the postcode of 
a city in England :-p).

In general, however, it will be good to keep the set of basic datavalue 
types small, while allowing the set of property types to grow. The set 
of base datavalue types that we use is based on the experience in SMW as 
well as on existing formats like XSD (which also has many derived types 
but only a few base types).

As for the possible confusion, I think some naming discipline would 
clarify this. In SMW, there is a stronger difference between both kinds 
of types, and a fixed schema for property type ids that makes it easy to 
recognise them.

In any case, using string for IRIs does not seem to solve any problem. 
It does not simplify the type system in general and it does not help 
with the use cases that I mentioned. What I do not agree with are your 
arguments about all of this being "internal". We would not have this 
discussion if it were. The data model of Wikidata is the primary 
conceptual model that specifies what Wikidata stores. You might still be 
right that some of the implementation is internal, but the arguments we 
both exchange are not really on the implementation level ;-).

Best wishes

Markus, offline soon for travelling

On 26/08/13 10:35, Daniel Kinzler wrote:
...
  Am 25.08.2013 19:19, schrieb Markus Krötzsch:
  If we have an IRI DV, considering that URLs are
special IRIs, it seems
 clear
 that IRI would be the best way of storing them. 
 The best way of storing them really depends on the storage platform. It
 may be a string or something else.

 I think the real issue here is that we are exposing something that is
 really an internal detail (the data value type) instead of the high
 level information we actually should be exposing, namely property type.

 I think splitting the two was a mistake, and I think exposing the DV
 type while making the property type all but inaccessible makes things a
 lot worse.

 In my opinion, data should be self-descriptive, so the *semantic* type
 of the property should be included along with the value. People expect
 this, and assume that this is what the DV type is. But it's not, and
 should not be used or abused for this purpose.

 Ideally, it should not matter at all to any 3rd party if use use a
 string or IRI DV internally. The (semantic) property type would be URL,
 and that's all that matters.

 I'm quite unhappy about the current situation; we are beginning to see
 the backlash of the decision not to include the property type inline. If
 we don't do anything about this now, I fear the confusion is going to
 get worse.

 -- daniel

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] claims Datatypes inconsistency suspicion