My two cents:
Please let us collect the arguments for and against using the IRI data value *structure* here (not for being able to *identify* whether a string is an IRI or a string).
I see no advantage in using the IriValue internally, as we have no use case for accessing parts of the URL individually.
I see some slight disadvantages (a little extra code to be written for plain text display and validation of IriValues).
- should, in the external JSON structure, for every snak the data value
type be listed (as it currently is)? I.e. should it state "string" instead of "Commons media filename"?
I think it's useless and even misleading to have that info in the external format. It should never have been there.
However, now that it is there, it may be prudent to keep it. The biggest problem with it is that people mistake/misuse it for identifying the type of the snak value - which is *not* what this represents.
- should, in the external JSON structure, for every snak the data type of
the property used be listed? This would then say URL, and this would solve all the use cases mentioned by Markus, which rely on *identifying* this distinction, not on the actual IRI data structure.
Yes, I think that would be very helpful for various reasons.
I would even go so far as to say that Snaks should also have this information internally (in php and in the internal JSON format).
We need to have an export of the whole Wikidata knowledge base in the external JSON format, rather sooner than later, and hopefully also in RDF. The lack of these dumps should not influence our decision right now, imho :)
Oh yes. It's really bad that people are relying on the internal format found in the XML dumps.
-- daniel