Re: [Wikidata-l] Data values

19 Dec 2012


      On 19.12.2012 08:53, Gregor Hagedorn wrote:
...
...
Displaying the numbers is another question. There I have to agree that it
always makes sense to also store a typical used unit for that type of data.
I agree. What I propose is that the user interface supports entering
and proofreading "10.6 nm" as "10.6" plus "n" (= nano) plus "meter".
Yes, absolutely.
...
How the value is stored in the data property, whether as 10.6 floating
point or as 1.6e-8 is a second issue -- the latter is probably
preferable.
I think neither is sufficient: we need a representation that allows for
arbitrary (or at least very great) precision, and can still be indexed and
compared natively by (different!) database systems. Fixed length strings can
easily do that, if they are long enough. That's pretty inefficient, though.
IEEE floats work natively, but don't guarantee enough precision (well, maybe 128
bit floats come close?). The SQL, "decimal" might be sufficient: in MySQL, it
allows 30 decimal digits before the decimal point, and up to 64 after. But
that's still not enough to measure the extent of the universe in Plancks.
...
In addition to a storage option of the desired unit prefix (this may
be considered a original-prefix, since naturally re-users may wish to
reformat this).
I see no point in storing the unit used for input.
...
it is probably necessary to store the number of
significant decimals.
That's how Denny proposed to calculate the default accuracy. If the accuracy is
given by a complex model (e.g. a gamma distribution), then it might be handy to
have a simple value that tells us the significant digits.
Hm... perhaps it's best to always express accuracy as "+/-n", and allow for more
detailed information (standard deviation, whatever) as *additional* information
about the accuracy (could be modelled as a qualifier internally).
...
I believe in the user interface this needs not
be any visible setting, simply the number of digits can be preserved.
Without these is impossible to store and reproduce information  like
"10.20 nm", it would be returned as 1.02 10^-8 m.
No, it would return using whatever system of measurement the user has selected
in their preferences.
...
Complex heuristic
may "guess" when to use the scientific SI prefixes instead. The
trailing zero cannot be reproduced however when completely relying on
IEEE floating-point.
We'll need heuristics to pick the correct secondary unit (e.g. nm or km). The
general rule could be to pick a unit so that the actual value is between 1 and
10, with some additional rules for dealing with cultural specialities (decimeter
is rarely used, hectoliter however is pretty common. The decagram is commonly
used in Austria only, etc).
Note that for rendering of values in infoboxes, the desired unit and precision
can always be given explicitly.
Note "precision" vs "accuracy" here: the precision controls how many digits are
shown, while the accuracy indicates how exact our knowledge is. The Precision
can be derived from the accuracy and vice versa, using appropriate heuristics.
But they are not the same. IMHO, the accuracy should always be stored with the
value, the precision never.
-- daniel
-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] Data values