(Proposal 3, modified)
* value (xsd:double or xsd:decimal)
* unit
(a wikidata item)
* totalDigits (xsd:smallint)
* fractionDigits
(xsd:smallint)
* originalUnit (a wikidata item)
* originalUnitPrefix (a
wikidata item)
JMc: I rearranged the list a bit and suggested simpler
naming
JMc: Is not originalUnitPrefix directly derived from
originalUnit?
JMc: May be more efficient to store not reconstruct the
original value. May even be better to store the original value somewhere
else entirely, earlier in the process, eg within the context that you
indicate would be worthwhile to capture, because I wouldnt expect alot
of retrievals, but you anticipate usage patterns certainly better than
I.
How about just:
Datatype: .number (Proposal
4)
-----------------------------------------
:value (xsd:double or
xsd:decimal)
:unit (a wikidata item)
:totalDigits (xsd:smallint)
:fractionDigits (xsd:smallint)
:original (a wikidata item that is a
number object)
On 20.12.2012 03:08, Gregor Hagedorn wrote:
On 20
December 2012 02:20,
<jmcclure(a)hypergrove.com> wrote:
> For me the
question is how to name the precision information. Do not the XSD
facets
"totalDigits" and "fractionDigits" work well enough? I mean
Yes,
that would be one way of modeling it. And I agree with you that,
although the xsd attributes originally are
devised for datatypes,
there is nothing wrong with re-using it for
quantities and
measurements.
So one way of expressing a measurement with
significant digits is:
(Proposal 1)
* normalizedValue
*
totalDigits
> * fractionDigits
> * originalUnit
> * normalizedUnit
>
To recover the original information (e.g. that
the original value was
in feet with a given number of significant
digits) the software must
convert normalizedUnit to originalUnit, scale to
totalDigits with
fractionDigits, calculate the remaining powers
of ten, and use some
information that must be stored together with
each unit whether this
then should be expressed using an SI unit prefix
(the Exa, Tera, Giga,
Mega, kilo, hekto, deka, centi, etc.). Some
units use them, others
not, and some units use only some. Hektoliter is
common, hektometer
would be very odd. This is slightly complicated
by the fact that for
some units prefix usage in lay topics differs
from scientific use.
>
If all numbers were expressed ONLY as total
digits with fraction
digits and unit-prefix, i.e. no power-of-ten
exponential, the above
would be sufficiently complete. However, without
additional
information it does not allow to recover the
entry:
100,230 * 10^3
tons
(value 1.0023e8, 6 total, 3 fractional digits,
original unit
tons,
normalized unit gram)
I had therefore made (on the wiki)
the proposal to express it as:
(Proposal 2)
* normalizedValue
*
significantDigits (= and I am happy with totalDigits instead)
*
originalUnit
* originalUnitPrefix
* normalizedUnit
However I
see now that the analysis was wrong, indeed it needs
fractionDigits in
addition to totalDigits, else
a similar problem may
occur, i.e. the
distribution of the total order
of magnitude of the
number between
non-fractional digits, fractional
digits, powers of 10
and
powers-of-10-expressed through SI units is
still not unambigous.
So
the minimal representation seems to be:
>
> (Proposal 3)
*
normalizedValue (xsd:double or xsd:decimal)
* totalDigits
(xsd:smallint)
* fractionDigits (xsd:smallint)
* originalUnit (a
wikidata item)
> * originalUnitPrefix (a wikidata item)
*
normalizedUnit (a wikidata item)
Adding the originalUnitPrefix has
the advantage that it gathers
knowledge from users and data creators
or
resources about which unit
prefix is appropriate in a given
context.
I see the current wikidata plan to solve this problem by
heuristics
very critical, I do not see the data set that
sufficiently
tests the
heuristics yet. Gathering information from data
entered and
creating a
formatting heuristics modules over the coming years
(instead of weeks)
will be valuable for reformatting. The Proposal 3
allows to gather
this information.
Gregor
Note 1: The
question of other means to express accuracy or precision,
e.g. by
error margins, statistical measures of
spread such as
variance,
confidence intervals, percentiles,
min/max etc. is not yet
> covered.
Given the present discussion, this should probably be
separately
agreed upon.
Note 2: Wikipedia Infoboxes may desire to override it,
this is for
data entering, review, curation, and a default display
where no other
> is defined
>
_______________________________________________
Wikidata-l mailing
list
> Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1]
Links:
------
[1]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l