A last note; listen to Markus, he is usually right. Darn! 😤
On Fri, Aug 12, 2016 at 12:02 PM, John Erling Blad jeblad@gmail.com wrote:
Latest date for population isn't necessarily the preferred one, it can be a predicted one for a short timespan. For example Statistics Norway provide a 3 month expectation in addition to the one year stats. The one year stats should be the preferred ones, the 3 month stats are kind of expected change on last years stats.
Main problem with the 3 month stats are that they usually can't be used together with one-year stats, ie. they can't be normalized against the same base. Absolute value would seem the same, but growt rate against a one-year base would be wrong. It is a quite usual to do that error.
A lot of stats "sounds similar" but isn't similar. It is a bit awkward. Sometimes stats refer to international standards for how they should be made, in those cases they can be compared. It is often described on a page for metadata about the stats. An example is population in rural areas, which many assume is the same in all countries. It is not.
And while I'm on it; stats often describe a (possibly temporal) connection or relation between two or more (types of) subjects, and it is not something you should assign to one of the subject. If one part is a concrete instance then it makes sense to add stats about the other types to that item, like population for a municipality, but otherwise it could be wrong.
In general, setting the last added or most recent value to preferred is in general wrong.
And also, that something is not-preferred does not imply that it is deprecated. And also note the difference between deprecated and deferred.
On Thu, Aug 11, 2016 at 10:56 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
I would argue that this is better done by using qualifiers (e.g. start data, end data). If a statement on the population size would be set to preferred, but isn't monitored for quite some time, it can be difficult to see if the "preferred" statement is still accurate, whereas a qualifier would give a better indication that that stament might need an update.
Right now this bot: https://www.wikidata.org/wiki/User:PreferentialBot watches statements like "population" that have multiple values with different time qualifiers but no current preference.
What it doesn't currently do is to verify that the preferred one refers to the latest date. It probably shouldn't fix these cases (because there may be valid cause why the latest is not the best, e.g. some population estimates are more precise than others) but it can alert about it. This can be added if needed.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata