Re: [Wikidata-l] Data values

21 Dec 2012

      Gregor Hagedorn g.m.hagedorn@gmail.com schrieb:
...
...
So, please suggest terms to use for at least these two things:

value certainty (ideally, not using "digits", but something that

is
...
independent of unit and rendering)
Here we want to talk about something that the true value is with a
certain probability within a given interval, something like: "2.3
+/-0.2 µm"
I am not too sure here myself. Different terms exist whether you talk
about an inherent measurement error of a single individual with a
single true value, or whether you speak of statistical measures or
estimates.
Marco gives yet another example: "We want to specify the "limits of
(possible) variation" of a value, which would be Engineering
tolerance. E.g. the value of electrical resistances, capacitors, etc.
are measured in Ω ± % or F ± %. We could also either use/allow/display
absolute or relative values." -- In this case, it is actually not a
uncertainty of the actual sample of resistors, but a design
specification, i.e. the specification that resistors must be (all or
only 95%?) within _at least_ these limits.
So what to do here?
List the different use cases of a value plus-minus other values?

measurement-method limited precision range of single measurements

(e.g.small structures in light microscope, limited by resolution
capability of blue light, approx. 0.2 µm)

measurement-method limited accuracy range (or accuracy plus

precision)

Confidence interval for mean (or other statistical parameters: mode,

variance, etc.) of the population as estimated based on a sample

one of potentially several percentiles (incl. +- s.d.) measuring

spread, but giving no information about the probability that the true
mean is between these values

engineering design specifications that a given (unknown) fraction of

individuals must be within these limits
I believe for the moment you don't want to go into certainty in the
sense that a number is an estimate of a
All these different concepts have rightly so different names. There can
be:

precision +/- 0.2
accuracy +/- 0.2
tolerance +/- 0.2
error margin +/- 0.2
+/- 1 or 2 s.d. +/- 0.2
95% confidence interval (CI) +/- 0.2
10 to 90% percentile  +/- 0.2
uncertainty (of what?) +/- 0.2

(ASIDE: the +/1 2 s.d. defines roughly a 95% probability that the next
value from a random sample is in the interval, the 95% CI that the
true value of the mean is in that interval. These are completely
different things -- for the same measurements you can report validly
100 +/- 50 for the first and 100 +-0.001 for the second. That is, with
probability 95% the next randomly sampled measurement will be between
50 and 150, and with probability 95% it is known that the true mean is
between 99.999 and 100.001. Semantic matters, not only the "pattern"
of plus-minus a value.)
Because of the widely varying use cases listed above, I believe we
need very neutral labels for the plus-minus values if the data type
shall simple provide two "variables" in a generic sense, the true
semantics of which are then provided by qualifier information.
I could think of something:

lower range (lowerRange) and upper range (upperRange).
lower/upper interval value/endpoint

but I don't very much like this because it would force people to
abandon the plus/minus notation and calculate actual values.
Better may be something like:

upwardsAbsolute
downwardsAbsolute
upwardsPercent
downwardsPercent

or

plusValueAbsolute
minusValueAbsolute
plusValuePercent
minusValuePercent

as neutral terms - but I would be glad if someone comes up with other
neutral terms.
However, I hope we start realizing that all of us seem to look at this
primarily from only one of the use cases listed above (me included, I
usually have cases with variance spread or CI of mean). We should stop
using terms that are specific to one but not the other of the cases.
The assumption "these things are all more or less the same" is not
true. A confidence interval is neither a manufacturing tolerance nor a
measurement precision. And precision is not accuracy, etc.
...

output exactness (here, the number of digits is actually what we

want to talk
...
about)
xsd:totalDigits or Wikipedia: significantDigits or significantFigures
that is one way to express value exactness, albeit a course on.
Marco writes: "Everywhere in the realm of software development
precision is used for this. Therefore also here the suggestion of
precision was not that bad."
-> In software development, the term is about the precision of the
numeric data type, i.e. the precision of the storage mechanism. The
term precision is correctly applied here. However, we talk about the
actually significant digits of a measurement, which are part of the
potential information on precision and accuracy of the value. The
measured value with e.g. 6 digits may be stored in a data type which
has a precision of 16 digits. I think applying "precision" to
significant digits is and produces a fundamental misunderstand of what
precision is, see the Wikipedia topic on precision and accuracy.
Hm the second one is only relevant for output. Why not using the Term outputformat as a pattern just like Excel, OpenOffice, and LibreOffice do? This could include the number of digits behind the comma, the optional accuracy/whatever and the unit. This will be fine for the API, and the MW-Syntax.
Cheers
Marco

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] Data values