Hi all.
Currently, a PropertyValueSnak knows the ID of the property it refers to, and the DataValue object representing the actual value. The DataValue has a "low level type" used (in the future) by the storage and indexing layer. But whenever we want to render or validate a snak, we need the property's high level data type.
Currently, we do this via the PropertyDataTypeLookup service.
I think that's quite inconvenient and inefficient. I think the Snak itself should know the data type's ID. Was there any particular reason not to do this? I remember a lengthy discussion about this, but I don't recall the outcome (yes, we really need to write this stuff down).
Having the data type in the Snak would mean that Snaks are self-contained in the JSON structure, and it would also make us robust against a property changing it's type.
Of course, changing this now means we have to provide some kind of fallback for the case that the type id is missing - and that is going to require a PropertyDataTypeLookup again. Annoying :/
Alternatively, Snak objects could require the type in the constructor - but that would require database lookups during deserialization; not fun.
Any thoughts on this? Why did we go this route? And is it really too late to change this now?
-- daniel
-1 Had to deal with this in the frontend as well and don't think this is inconvenient. It seems like the cleanest approach. Polluting the Snaks with information like this for performance or convenience reasons will probably cause more trouble in the end than keeping it as simple and "pure" as possible.
2013/6/12 Daniel Kinzler daniel.kinzler@wikimedia.de
Hi all.
Currently, a PropertyValueSnak knows the ID of the property it refers to, and the DataValue object representing the actual value. The DataValue has a "low level type" used (in the future) by the storage and indexing layer. But whenever we want to render or validate a snak, we need the property's high level data type.
Currently, we do this via the PropertyDataTypeLookup service.
I think that's quite inconvenient and inefficient. I think the Snak itself should know the data type's ID. Was there any particular reason not to do this? I remember a lengthy discussion about this, but I don't recall the outcome (yes, we really need to write this stuff down).
Having the data type in the Snak would mean that Snaks are self-contained in the JSON structure, and it would also make us robust against a property changing it's type.
Of course, changing this now means we have to provide some kind of fallback for the case that the type id is missing - and that is going to require a PropertyDataTypeLookup again. Annoying :/
Alternatively, Snak objects could require the type in the constructor - but that would require database lookups during deserialization; not fun.
Any thoughts on this? Why did we go this route? And is it really too late to change this now?
-- daniel
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Am 13.06.2013 03:22, schrieb Daniel Werner:
-1 Had to deal with this in the frontend as well and don't think this is inconvenient. It seems like the cleanest approach. Polluting the Snaks with information like this for performance or convenience reasons will probably cause more trouble in the end than keeping it as simple and "pure" as possible.
You think that giving a data structure information about its type is polluting it? Why so? This seems pretty basic and streight forward to me.
-- daniel
How would you use that additional type information? A PropertyValueSnaks (PVS) referencing a certain property can still have ANY value of ANY value types, e.g. a value not necessarily matching the property's data type's data value type. This is something all code should consider and handle well.
If you want to put the PVS's property's data type ID into the PVS instance, then what would that data type ID be? What would be the precise definition of that additional field?
* Would you take the ID of the PVS's property's current data type? --> you still had to load the properties then when building the Snaks. Just moving that inconvenience somewhere else.
* Or would you save that ID as part of the data model within the Snak already? In that case the ID would refer to the PVS's property's data type at the time of the Snak's construction. - The data type of the property could have changed (we don't support changing that as far as I know but I'd always consider it). So you still had to fetch the property to compare its current data type with the one of the snak. - The data type's structure could have changed (e.g. using a different data value type). So even though your PVS is still referring to the same data type as the PVS's property, the PVS's value might not be valid against that data type anymore. Something you still had to consider and check for. --> all your type ID field would really stand for, would be "The data type the PVS's value has been valid against at the time of its construction". At what point is this information really useful? In how many cases can this really spare you loading the PVS's property to look up the actual data type? Also, we might add other attributes than "datatype" to properties at some point, some might influence validation (e.g. the unit of a number). Would you then go ahead and add these information also to the PVS's constructor?
2013/6/13 Daniel Kinzler daniel.kinzler@wikimedia.de
Am 13.06.2013 03:22, schrieb Daniel Werner:
-1 Had to deal with this in the frontend as well and don't think this is inconvenient. It seems like the cleanest approach. Polluting the Snaks
with
information like this for performance or convenience reasons will
probably cause
more trouble in the end than keeping it as simple and "pure" as possible.
You think that giving a data structure information about its type is polluting it? Why so? This seems pretty basic and streight forward to me.
-- daniel
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Am 15.06.2013 01:17, schrieb Daniel Werner:
How would you use that additional type information?
For rendering the value, mainly. It seems awkward that I would have to look up the property's data type in order to render a value associated with that property. Who sais that values can not exist without being associated with a property?
To me it seems natural that the data structure representing the value should be self-contained - that includes that it knows it's type. That way, the data structure contains all information necessary for interpreting it.
A PropertyValueSnaks (PVS) referencing a certain property can still have ANY value of ANY value types, e.g. a value not necessarily matching the property's data type's data value type. This is something all code should consider and handle well.
Code that assigns a new value to a given property, that is, code that creates or updates a snak value, needs to make sure that the value has the correct type for the given property. We currently already do this for the DataValue-type.
All other code handling the value would just look at the value's data type, and not care about the property or it's type.
If you want to put the PVS's property's data type ID into the PVS instance, then what would that data type ID be? What would be the precise definition of that additional field?
It would be a data type identifier given as a string in the PVS's data structure.
- Would you take the ID of the PVS's property's current data type?
--> you still had to load the properties then when building the Snaks. Just moving that inconvenience somewhere else.
Yes, that wouldn't give us much, except for cleaner code for handling snak values.
- Or would you save that ID as part of the data model within the Snak already?
In that case the ID would refer to the PVS's property's data type at the time of the Snak's construction.
Indeed. It would refer to the values *actual* data type, allowing us to interpret it correctly (especially for output, in the UI, in dumps, etc) even if the property's type changes.
Currently, a property's type can never change - if it changed, we would lose the ability to interpret values that are already in the database.
- The data type of the property could have changed (we don't support changing
that as far as I know but I'd always consider it). So you still had to fetch the property to compare its current data type with the one of the snak.
Why would I have to check that? I'm trying to interpret a value for output or such, why should I (need to) care about the property's type?
Only when *setting* a value we need to do this check, and we already need to check the datavalue's type in that case anyway.
- The data type's structure could have changed (e.g. using a different data
value type). So even though your PVS is still referring to the same data type as the PVS's property, the PVS's value might not be valid against that data type anymore. Something you still had to consider and check for.
That is no different from the current situation. The code for creating value objcts (Snaks, DataValues) from array structures must be backwards compatible in any case, otherwise we lose access to existing values in the database.
--> all your type ID field would really stand for, would be "The data type the PVS's value has been valid against at the time of its construction". At what point is this information really useful? In how many cases can this really spare you loading the PVS's property to look up the actual data type?
In all cases except when setting the value.
Also, we might add other attributes than "datatype" to properties at some point, some might influence validation (e.g. the unit of a number). Would you then go ahead and add these information also to the PVS's constructor?
No, because it is only relevant for validation, that is, when the value *changes*.
The point of having the value be self-contained and know it's type is to be able to *interpret* it safely without contextual knowledge. That would be far more robust than the current situation.
-- daniel
2013/6/16 Daniel Kinzler daniel.kinzler@wikimedia.de
- The data type of the property could have changed (we don't support
changing
that as far as I know but I'd always consider it). So you still had to
fetch the
property to compare its current data type with the one of the snak.
Why would I have to check that? I'm trying to interpret a value for output or such, why should I (need to) care about the property's type?
Only when *setting* a value we need to do this check, and we already need to check the datavalue's type in that case anyway.
I am in favor of always displaying existing values if possible, even if it is not valid against the current definitions (property/datatype) anymore. BUT, if the value is not valid against the definition anymore, in many situations the display of the value should come with an additional information about that. Otherwise this can be very misleading. In the JS UI we are currently not displaying those values, we only display a red message telling the user there is something wrong with the value and the user can change it into something valid or remove it.
For displaying the value in a useful way though, you don't necessarily need the PVS's Property's data type. Just having the data value, taking its data type and using a standard formatter for that data value type would be sufficient I believe. We could already do that in the JS frontend for broken values of most of our data value types, just nobody had the time to take care of it.
Also, we might add other attributes than "datatype" to properties at some point,
some might influence validation (e.g. the unit of a number). Would you
then go
ahead and add these information also to the PVS's constructor?
No, because it is only relevant for validation, that is, when the value *changes*.
Not sure you thought this through. The example I gave (unit of a number) would already be something not only relevant for validation but also for how the value should be displayed. So my question still stands. I guess there are many more possibilities of Property attributes that would not only affect validation. With your approach, we would take this flexibility. That's what I've been afraid of in the first place. An optimization that will cost us time and nerves later on.
Cheers, Daniel
Hey,
I agree with Daniel Werners last post, though it looks like it has 2 typos in it. Will highlight them to avoid confusion:
Just having the data value, taking its *data type* and using a standard
formatter for that data value type would be sufficient
"data type" should just be "type".
With your approach, we would take this *flexibility*. That's what I've
been afraid of in the first place.
"flexibility" probably should be something else.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --
2013/6/17 Jeroen De Dauw jeroendedauw@gmail.com
Hey,
I agree with Daniel Werners last post, though it looks like it has 2 typos in it. Will highlight them to avoid confusion:
Just having the data value, taking its *data type* and using a standard
formatter for that data value type would be sufficient
"data type" should just be "type".
You are right, taking the values data value type and putting a standard formatter for that (no data type involved).
With your approach, we would take this *flexibility*. That's what I've
been afraid of in the first place.
"flexibility" probably should be something else.
Not really. Talking about the flexibility of additional attributes on properties that do change validation and/or formatting of values of PVS using that property.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Am 17.06.2013 14:02, schrieb Daniel Werner:
I am in favor of always displaying existing values if possible, even if it is not valid against the current definitions (property/datatype) anymore. BUT, if the value is not valid against the definition anymore, in many situations the display of the value should come with an additional information about that.
I'd say in *some* situations, not most. Definitly when editing.
For general display in the UI, one can argue one way or the other. For all machine readable output (API, linked data, dumps) this is not relevant.
Otherwise this can be very misleading.
How exactly?
In the JS UI we are currently not displaying those values, we only display a red message telling the user there is something wrong with the value and the user can change it into something valid or remove it.
I think it would be sufficient to do these things when editing.
The point is: the value *is* valid. The idea is that the propery just provides the info which type new values should have. Existing values always have their own type, and they are valid against that.
For displaying the value in a useful way though, you don't necessarily need the PVS's Property's data type. Just having the data value, taking its data type and using a standard formatter for that data value type would be sufficient I believe.
This is only the case if we have a 1:1 mapping of data types and dv types. For example, commonsMedia (and soon URLs) use "string" DVs; But being restricted to a plain string formatter would be somewhat annoying, right?
We could already do that in the JS frontend for broken values of most of our data value types, just nobody had the time to take care of it.
I'm trying to introduce a PropertyBadValueSnak for handling this in PHP. Jeroen suggested we should ask Markus for input on that.
> Also, we might add other attributes than "datatype" to properties at some point, > some might influence validation (e.g. the unit of a number). Would you then go > ahead and add these information also to the PVS's constructor? No, because it is only relevant for validation, that is, when the value *changes*.
Not sure you thought this through. The example I gave (unit of a number) would already be something not only relevant for validation but also for how the value should be displayed.
Unit and precision are part of the "Quantity" DV, so they would be available without looking at the property. As they should be.
So my question still stands. I guess there are many more possibilities of Property attributes that would not only affect validation. With your approach, we would take this flexibility. That's what I've been afraid of in the first place. An optimization that will cost us time and nerves later on.
I would argue that any such information needs to be part of either the DV or the Snak, precisely for the reason that it is vital for interpreting the value.
Again: the idea is that values should be self-contained.
-- daniel
Hey,
Putting the DataType id in PropertyValueSnaks at this point seems like a bad idea for several reasons. Doing so would cost us quite some work end end up with a more complicated system as foundation. If you have a use case for which the current code is not well suited, I suggest writing new code for that specific use case. I strongly suspect this will both be simpler and less work.
I remember a lengthy discussion about this, but I don't recall the outcome
(yes, we really need to write this stuff down).
There was no decision at any point to change this, though it indeed has been brought up before.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --
Am 13.06.2013 06:38, schrieb Jeroen De Dauw:
Hey,
Putting the DataType id in PropertyValueSnaks at this point seems like a bad idea for several reasons. Doing so would cost us quite some work end end up with a more complicated system as foundation.
Changing it now would be hard.
But I think it would have been simpler and cleaner if we had gone that route from the start.
Why would it be a bad idea? To me, it's just a self-contained data structure that knows it's own type, as it should.
If you have a use case for which the current code is not well suited, I suggest writing new code for that specific use case. I strongly suspect this will both be simpler and less work.
Any code I can write for this now will involve injecting knowledge about properties into the snaks post-hoc. That's going to suck.
I remember a lengthy discussion about this, but I don't recall the outcome (yes, we really need to write this stuff down).
There was no decision at any point to change this, though it indeed has been brought up before.
Well, at some point, the decision was made, right? Was it disucssed? Were the implications of each approach compared? Is this documented somewhere?
I recall a lengthy skype call with Markus and Denny about this, and I *seem* to recall that we decided to store the type in the snaks - but as too often, i don't think this is documented anywhere.
So, what's the point of whining now, since it's too late anyway? I'd like to understand the rationale for going with the current system. And I would like to make the case for more communication and documentation about design decisions like this. Especially since anything concerning the internal data tsructure that goes into the DB is very hard to change later.
-- daniel
wikidata-tech@lists.wikimedia.org