Latest date for population isn't necessarily the preferred one, it can be a
predicted one for a short timespan. For example Statistics Norway provide a
3 month expectation in addition to the one year stats. The one year stats
should be the preferred ones, the 3 month stats are kind of expected change
on last years stats.
Main problem with the 3 month stats are that they usually can't be used
together with one-year stats, ie. they can't be normalized against the same
base. Absolute value would seem the same, but growt rate against a one-year
base would be wrong. It is a quite usual to do that error.
A lot of stats "sounds similar" but isn't similar. It is a bit awkward.
Sometimes stats refer to international standards for how they should be
made, in those cases they can be compared. It is often described on a page
for metadata about the stats. An example is population in rural areas,
which many assume is the same in all countries. It is not.
And while I'm on it; stats often describe a (possibly temporal) connection
or relation between two or more (types of) subjects, and it is not
something you should assign to one of the subject. If one part is a
concrete instance then it makes sense to add stats about the other types to
that item, like population for a municipality, but otherwise it could be
wrong.
In general, setting the last added or most recent value to preferred is in
general wrong.
And also, that something is not-preferred does not imply that it is
deprecated. And also note the difference between deprecated and deferred.
On Thu, Aug 11, 2016 at 10:56 PM, Stas Malyshev <smalyshev(a)wikimedia.org>
wrote:
Hi!
I would argue that this is better done by using
qualifiers (e.g. start
data, end data). If a statement on the population size would be set to
preferred, but isn't monitored for quite some time, it can be difficult
to see if the "preferred" statement is still accurate, whereas a
qualifier would give a better indication that that stament might need an
update.
Right now this bot:
https://www.wikidata.org/wiki/User:PreferentialBot
watches statements like "population" that have multiple values with
different time qualifiers but no current preference.
What it doesn't currently do is to verify that the preferred one refers
to the latest date. It probably shouldn't fix these cases (because there
may be valid cause why the latest is not the best, e.g. some population
estimates are more precise than others) but it can alert about it. This
can be added if needed.
--
Stas Malyshev
smalyshev(a)wikimedia.org
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata