Dear all,
As you may know, statements in Wikidata can be marked as "preferred" or "deprecated" to distinguish them from the "normal" ones.
I found that many items have perfectly valid historical statements marked as "deprecated". For example, our showcase item "Kleinmachnow"
https://www.wikidata.org/wiki/Q104192
has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case.
It seems that somebody wanted to indicate that this old population is no longer current. This is achieved not by deprecating the old value, but by setting another (newer) value as "preferred".
Similar problems occur for the mayor of this town.
I hope there is no deeper confusion in the community regarding this intended use of "preferred" and "deprecated". Most other items are using it correctly, it seems. The fact that it occurs in a showcase item is still making me a bit concerned.
Cheers,
Markus
On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch markus.kroetzsch@tu-dresden.de wrote:
Dear all,
As you may know, statements in Wikidata can be marked as "preferred" or "deprecated" to distinguish them from the "normal" ones.
I found that many items have perfectly valid historical statements marked as "deprecated". For example, our showcase item "Kleinmachnow"
https://www.wikidata.org/wiki/Q104192
has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case.
It seems that somebody wanted to indicate that this old population is no longer current. This is achieved not by deprecating the old value, but by setting another (newer) value as "preferred".
Similar problems occur for the mayor of this town.
I hope there is no deeper confusion in the community regarding this intended use of "preferred" and "deprecated". Most other items are using it correctly, it seems. The fact that it occurs in a showcase item is still making me a bit concerned.
At least on this item they were set by a now inactive user. Do you have more items to see if it is the same user there? Can someone fix the ranks in that item please?
https://www.wikidata.org/wiki/Help:Ranking seems fine wrt deprecated rank and I've not seen much confusion around it.
Cheers Lydia
On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case.
I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given statement is no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts.
It seems that somebody wanted to indicate that this old population is no longer current. This is achieved not by deprecating the old value, but by setting another (newer) value as "preferred".
I would argue that this is better done by using qualifiers (e.g. start data, end data). If a statement on the population size would be set to preferred, but isn't monitored for quite some time, it can be difficult to see if the "preferred" statement is still accurate, whereas a qualifier would give a better indication that that stament might need an update.
Cheers,
Andra
On 11.08.2016 18:45, Andra Waagmeester wrote:
On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de mailto:markus.kroetzsch@tu-dresden.de> wrote:
has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case.
I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given statement is no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts.
The status "deprecated" is part of the claim of the statement. The reference is supposed to support this claim, which in this case is also the claim that it is deprecated. The status is not meant to deprecate a reference (not saying that this is never useful, potentially, but you can only use it in one way, and it seems much more practical if deprecated statements get references that explain why they are deprecated).
It seems that somebody wanted to indicate that this old population is no longer current. This is achieved not by deprecating the old value, but by setting another (newer) value as "preferred".
I would argue that this is better done by using qualifiers (e.g. start data, end data). If a statement on the population size would be set to preferred, but isn't monitored for quite some time, it can be difficult to see if the "preferred" statement is still accurate, whereas a qualifier would give a better indication that that stament might need an update.
Sure, there should always be qualifiers as needed, and we already have qualifiers like start and end date in most cases. However, one should still set the "best" statemnt(s) to be preferred as a help for users of the data. When you use date in queries or in LUA, it would be very hard to analyse all statements' qualifiers to find out which one is currently the best. The "preferred" rank gives a simple shortcut there. In SPARQL, for example, the best ranked statements will be used in the simplified "direct" properties in namespace wdt: Users who want to get all the details can still use the qualifiers, but this leads to more complicated queries.
Best regards,
Markus
Cheers,
Andra
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi!
I would argue that this is better done by using qualifiers (e.g. start data, end data). If a statement on the population size would be set to preferred, but isn't monitored for quite some time, it can be difficult to see if the "preferred" statement is still accurate, whereas a qualifier would give a better indication that that stament might need an update.
Right now this bot: https://www.wikidata.org/wiki/User:PreferentialBot watches statements like "population" that have multiple values with different time qualifiers but no current preference.
What it doesn't currently do is to verify that the preferred one refers to the latest date. It probably shouldn't fix these cases (because there may be valid cause why the latest is not the best, e.g. some population estimates are more precise than others) but it can alert about it. This can be added if needed.
Latest date for population isn't necessarily the preferred one, it can be a predicted one for a short timespan. For example Statistics Norway provide a 3 month expectation in addition to the one year stats. The one year stats should be the preferred ones, the 3 month stats are kind of expected change on last years stats.
Main problem with the 3 month stats are that they usually can't be used together with one-year stats, ie. they can't be normalized against the same base. Absolute value would seem the same, but growt rate against a one-year base would be wrong. It is a quite usual to do that error.
A lot of stats "sounds similar" but isn't similar. It is a bit awkward. Sometimes stats refer to international standards for how they should be made, in those cases they can be compared. It is often described on a page for metadata about the stats. An example is population in rural areas, which many assume is the same in all countries. It is not.
And while I'm on it; stats often describe a (possibly temporal) connection or relation between two or more (types of) subjects, and it is not something you should assign to one of the subject. If one part is a concrete instance then it makes sense to add stats about the other types to that item, like population for a municipality, but otherwise it could be wrong.
In general, setting the last added or most recent value to preferred is in general wrong.
And also, that something is not-preferred does not imply that it is deprecated. And also note the difference between deprecated and deferred.
On Thu, Aug 11, 2016 at 10:56 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
I would argue that this is better done by using qualifiers (e.g. start data, end data). If a statement on the population size would be set to preferred, but isn't monitored for quite some time, it can be difficult to see if the "preferred" statement is still accurate, whereas a qualifier would give a better indication that that stament might need an update.
Right now this bot: https://www.wikidata.org/wiki/User:PreferentialBot watches statements like "population" that have multiple values with different time qualifiers but no current preference.
What it doesn't currently do is to verify that the preferred one refers to the latest date. It probably shouldn't fix these cases (because there may be valid cause why the latest is not the best, e.g. some population estimates are more precise than others) but it can alert about it. This can be added if needed.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
A last note; listen to Markus, he is usually right. Darn! 😤
On Fri, Aug 12, 2016 at 12:02 PM, John Erling Blad jeblad@gmail.com wrote:
Latest date for population isn't necessarily the preferred one, it can be a predicted one for a short timespan. For example Statistics Norway provide a 3 month expectation in addition to the one year stats. The one year stats should be the preferred ones, the 3 month stats are kind of expected change on last years stats.
Main problem with the 3 month stats are that they usually can't be used together with one-year stats, ie. they can't be normalized against the same base. Absolute value would seem the same, but growt rate against a one-year base would be wrong. It is a quite usual to do that error.
A lot of stats "sounds similar" but isn't similar. It is a bit awkward. Sometimes stats refer to international standards for how they should be made, in those cases they can be compared. It is often described on a page for metadata about the stats. An example is population in rural areas, which many assume is the same in all countries. It is not.
And while I'm on it; stats often describe a (possibly temporal) connection or relation between two or more (types of) subjects, and it is not something you should assign to one of the subject. If one part is a concrete instance then it makes sense to add stats about the other types to that item, like population for a municipality, but otherwise it could be wrong.
In general, setting the last added or most recent value to preferred is in general wrong.
And also, that something is not-preferred does not imply that it is deprecated. And also note the difference between deprecated and deferred.
On Thu, Aug 11, 2016 at 10:56 PM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
I would argue that this is better done by using qualifiers (e.g. start data, end data). If a statement on the population size would be set to preferred, but isn't monitored for quite some time, it can be difficult to see if the "preferred" statement is still accurate, whereas a qualifier would give a better indication that that stament might need an update.
Right now this bot: https://www.wikidata.org/wiki/User:PreferentialBot watches statements like "population" that have multiple values with different time qualifiers but no current preference.
What it doesn't currently do is to verify that the preferred one refers to the latest date. It probably shouldn't fix these cases (because there may be valid cause why the latest is not the best, e.g. some population estimates are more precise than others) but it can alert about it. This can be added if needed.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata