Hi Daniel,
if I understand you correctly, you are in favour of equating datavalue types and property types. This would solve indeed the problems at hand.
The reason why both kinds of types are distinct in SMW and also in Wikidata is that property types are naturally more extensible than datavalue types. CommonsMedia is a good example of this: all you need is a custom UI and you can handle "new" data without changing the underlying data model. This makes it easy for contributors to add new types without far-reaching ramifications in the backend (think of numbers, which could be decimal, natural, positive, range-restricted, etc. but would still be treated as a "number" in the backend).
Using fewer datavalue types also improves interoperability. E.g., you want to compare two numbers, even if one is a natural number and another one is a decimal.
There is no simple rule for deciding how many datavalue types there should be. The general guideline is to decide on datavalue types based on use cases. I am arguing for diversifying IRIs and strings since there are many contexts and applications where this is a crucial difference. Conversely, I don't know of any application where it makes sense to keep the two similar (this would have to be something where we compare strings and IRIs on a data level, e.g., if you were looking for all websites with URLs that are alphabetically greater than the postcode of a city in England :-p).
In general, however, it will be good to keep the set of basic datavalue types small, while allowing the set of property types to grow. The set of base datavalue types that we use is based on the experience in SMW as well as on existing formats like XSD (which also has many derived types but only a few base types).
As for the possible confusion, I think some naming discipline would clarify this. In SMW, there is a stronger difference between both kinds of types, and a fixed schema for property type ids that makes it easy to recognise them.
In any case, using string for IRIs does not seem to solve any problem. It does not simplify the type system in general and it does not help with the use cases that I mentioned. What I do not agree with are your arguments about all of this being "internal". We would not have this discussion if it were. The data model of Wikidata is the primary conceptual model that specifies what Wikidata stores. You might still be right that some of the implementation is internal, but the arguments we both exchange are not really on the implementation level ;-).
Best wishes
Markus, offline soon for travelling
On 26/08/13 10:35, Daniel Kinzler wrote:
Am 25.08.2013 19:19, schrieb Markus Krötzsch:
If we have an IRI DV, considering that URLs are special IRIs, it seems clear that IRI would be the best way of storing them.
The best way of storing them really depends on the storage platform. It may be a string or something else.
I think the real issue here is that we are exposing something that is really an internal detail (the data value type) instead of the high level information we actually should be exposing, namely property type.
I think splitting the two was a mistake, and I think exposing the DV type while making the property type all but inaccessible makes things a lot worse.
In my opinion, data should be self-descriptive, so the *semantic* type of the property should be included along with the value. People expect this, and assume that this is what the DV type is. But it's not, and should not be used or abused for this purpose.
Ideally, it should not matter at all to any 3rd party if use use a string or IRI DV internally. The (semantic) property type would be URL, and that's all that matters.
I'm quite unhappy about the current situation; we are beginning to see the backlash of the decision not to include the property type inline. If we don't do anything about this now, I fear the confusion is going to get worse.
-- daniel