On 20 December 2012 02:20, <jmcclure(a)hypergrove.com> wrote:
For me the question is how to name the precision
information. Do not the XSD
facets "totalDigits" and "fractionDigits" work well enough? I mean
Yes, that would be one way of modeling it. And I agree with you that,
although the xsd attributes originally are devised for datatypes,
there is nothing wrong with re-using it for quantities and
measurements.
So one way of expressing a measurement with significant digits is:
(Proposal 1)
* normalizedValue
* totalDigits
* fractionDigits
* originalUnit
* normalizedUnit
To recover the original information (e.g. that the original value was
in feet with a given number of significant digits) the software must
convert normalizedUnit to originalUnit, scale to totalDigits with
fractionDigits, calculate the remaining powers of ten, and use some
information that must be stored together with each unit whether this
then should be expressed using an SI unit prefix (the Exa, Tera, Giga,
Mega, kilo, hekto, deka, centi, etc.). Some units use them, others
not, and some units use only some. Hektoliter is common, hektometer
would be very odd. This is slightly complicated by the fact that for
some units prefix usage in lay topics differs from scientific use.
If all numbers were expressed ONLY as total digits with fraction
digits and unit-prefix, i.e. no power-of-ten exponential, the above
would be sufficiently complete. However, without additional
information it does not allow to recover the entry:
100,230 * 10^3 tons
(value 1.0023e8, 6 total, 3 fractional digits, original unit tons,
normalized unit gram)
I had therefore made (on the wiki) the proposal to express it as:
(Proposal 2)
* normalizedValue
* significantDigits (= and I am happy with totalDigits instead)
* originalUnit
* originalUnitPrefix
* normalizedUnit
However I see now that the analysis was wrong, indeed it needs
fractionDigits in addition to totalDigits, else a similar problem may
occur, i.e. the distribution of the total order of magnitude of the
number between non-fractional digits, fractional digits, powers of 10
and powers-of-10-expressed through SI units is still not unambigous.
So the minimal representation seems to be:
(Proposal 3)
* normalizedValue (xsd:double or xsd:decimal)
* totalDigits (xsd:smallint)
* fractionDigits (xsd:smallint)
* originalUnit (a wikidata item)
* originalUnitPrefix (a wikidata item)
* normalizedUnit (a wikidata item)
Adding the originalUnitPrefix has the advantage that it gathers
knowledge from users and data creators or resources about which unit
prefix is appropriate in a given context.
I see the current wikidata plan to solve this problem by heuristics
very critical, I do not see the data set that sufficiently tests the
heuristics yet. Gathering information from data entered and creating a
formatting heuristics modules over the coming years (instead of weeks)
will be valuable for reformatting. The Proposal 3 allows to gather
this information.
Gregor
Note 1: The question of other means to express accuracy or precision,
e.g. by error margins, statistical measures of spread such as
variance, confidence intervals, percentiles, min/max etc. is not yet
covered.
Given the present discussion, this should probably be separately agreed upon.
Note 2: Wikipedia Infoboxes may desire to override it, this is for
data entering, review, curation, and a default display where no other
is defined