Re: [Wikidata-tech] IRI-value or string-value for URLs?

30 Aug 2013

Just following up on some discussion I had with DanielK and Jeroen today on
this, and summarizing it for the mailing list.

I still fail to see what the advantage would be to use the IRI datavalue -
especially when it is basically stripped down to be a string datavalue, as
Jeroen suggests in the last mail here.

I do see an advantage of stating the property datatype in a snak in the
external JSON representation, and am trying to understand what prevents us
from doing so. If we would do so, we would enable all use cases that were
mentioned, or am I missing something?

I also recognize the importance of having soon an official JSON dump, the
lack of which currently forces people to rely on the internal
representation in the available dumps. As said, the format of the internal
dumps is expected to keep changing. I will bump this up on the priority
list accordingly.

(Re Commons media file being different: Commons media file is, in the end,
also just a URI represented by a string, I do not see why it is so
different to URIs).

2013/8/30 Jeroen De Dauw &lt;jeroendedauw(a)gmail.com&gt;

...
  Hey Markus,

 Thanks for the writeup. This clarified some things, at least for me.

 However, this does not mean that you have to store the value as a compound
  object that contains many strings. In fact, this
strikes me as a rather
 cumbersome approach that would make it harder to use the data. In SMW we
 store URIs as one string. Splitting this string into parts (under the
 assumption that it was a well-formed URL to start with) is quite easy, if
 this is needed (SMW does this). Conclusion: the use of a datatype for IRIs
 is in no way tied to the use of an impractical serialisation; reference
 implementations exist.

 Agreed. The IriValue implementation is based on the SMW one, and retains
 this capability. Using serialize and unserialize will cause concaternation
 of the parts into one string, and then split them back up to a bunch before
 they are passed to the constructor.

 We are currently not using this though. Instead we are using the two last
 methods here:

https://github.com/wikimedia/mediawiki-extensions-DataValues/blob/eea0d0e19…

 So the solution to the problem at hand seems to be either to change these
 two methods to do the same as serialize and unserialize, or to simply not
 use these methods in our serialization process. The former approach is the
 most local and easy to implement, and given the urgency of this, the one I
 suggest going with.

 (Somewhat different topic, mainly directed at the WD team itself:) It is
 however an indication that having these two methods in the DataValue
 implementations is not the best idea to begin with. This has been clear for
 some time, though in order to fix this, we effectively need to go with the
 second approach and implement proper serialization infrastructure for
 DataValues. That'd also fix a number of other problems and awkwardness the
 current approach is causing.

 Cheers

 --
 Jeroen De Dauw
 http://www.bn2vs.com
 Don't panic. Don't be evil. ~=[,,_,,]:3
 --

 _______________________________________________
 Wikidata-tech mailing list
 Wikidata-tech(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-tech

-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: [Wikidata-tech] IRI-value or string-value for URLs?