On 12.08.2016 17:24, Jean-Luc Léger wrote:
On 2016-08-11 22:29, Markus Kroetzsch wrote:
On 11.08.2016 18:45, Andra Waagmeester wrote:
On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch
<markus.kroetzsch(a)tu-dresden.de
<mailto:markus.kroetzsch@tu-dresden.de>
<mailto:markus.kroetzsch@tu-dresden.de
<mailto:markus.kroetzsch@tu-dresden.de>>>
wrote:
has a statement "population: 20,086 (point in time: 2011)"
that is
confirmed by a reference. Nevertheless, the statement is
marked as
"deprecated". This would mean that the statement "the
popluation was
20,086 in 2011" is wrong. As far as I can tell, this is not
the case.
I wouldn't say that with a deprecated rank, that statement is
"wrong". I
consider de term deprecated to indicate that a given statement is no
longer valid in the context of a given resource (reference). I
agree, in
this specific case the use of the deprecated rank is wrong, since no
references are given to that specific statement.
Nevertheless, I think it is possible to have disagreeing
resources on an
identical statement, where two identical statements exists, one with
rank "deprecated" and one with rank "normal". It is up to
the
user to
decide which source s/he trusts.
The status "deprecated" is part of the claim of the statement. The
reference is supposed to support this claim, which in this case is
also the claim that it is deprecated. The status is not meant to
deprecate a reference (not saying that this is never useful,
potentially, but you can only use it in one way, and it seems much
more practical if deprecated statements get references that explain
why they are deprecated).
Yes. I think a complete deprecated statement should look like this :
Rank: Deprecated
Value: <some value>
Qualifier: P2241:reason for deprecation + <some reason>
References
* P248:Stated in (or any other property for a reference) --> a
reference where the value is true (explaining why we added it)
Value: <name of the reference>
+ any additional qualifiers
* P1310:statement disputed by --> a
reference explaining why the claim is deprecated
Value: <name of the reference>
+ any additional qualifiers
I am afraid that this is not a good approach, and it will lead to
problems in the future. The status "deprecated" refers to the *complete
claim, including all qualifiers*. So if you add a qualifier P2241, it
would also be part of what is "deprecated", which is clearly not
intended here. This is part of the general data structure in Wikidata,
and tools using the data would expect this to hold true. Ranks are a
built-in feature of the software, so this aspect is not really open to
interpretation.
What you are doing here is giving up part of the pre-defined structure
and replacing it by some local (site-specific) consensus. I know that
this might be a bit subtle and not so easy to see at first, but it is a
big step away from structured data that is easy to share across
applications.
For example, imagine an application wants to compare "normal" statements
with "deprecated" statements to see if there is any apparent
contradiction (the same statement being given with both ranks). This
would no longer work if you add meta-information to deprecated
statements in the form of qualifiers. For a software tool, an additional
quantifier simply changes the meaning. Imagine that one statement has an
additional "end date" qualifier that the other one is lacking --
clearly, it would be perfectly reasonable that the statement with the
end date is deprecated while the one that has only a start but no end is
not. Technically, there is no difference between this situation and the
situation where you add a new qualifier "P2241".
Now you could say: "Software should know the special meaning of P2241
and treat it accordingly." But this is only working for one site
(Wikidata in this case). A future Wikibase-enabled Commons or Wiktionary
would use different properties. You end up with having to change
software for each site, and severely reducing interoperability across
sites (imagine you want to combine data from two sites before processing
it).
Even if you are only interested in a single site (Wikidata), you are
changing the way in which statements should be interpreted over time. If
the community uses qualifiers to change the data model like this, then
the current definition of these qualifiers dictates how statements
should be interpreted. Then if you want to analyse history, things can
be very difficult.
What to do? It is quite simple: P2241 clearly belongs into the reference
of a deprecated statement, not into its qualifiers. This will retain the
same information while keeping the distinction between the claim that is
deprecated (and which may have qualifiers) and the meta-data that
explains why this is the case. Indeed, giving justification and
explanation for a statement is precisely what the references are for, so
P2241 fits there
I am not so sure if the rest of your modelling can work either, since it
seems to me that you cannot in general capture two references (the
original "P248" one and the correcting "P1310" one) in a single
reference. Giving them both as two individual references would be a bad
idea, since it would again change the meaning of the data, since you
would give two mutually contradicting references for the same claim, and
site-specific extra information would be needed to understand what is
going on.
In fact, this is another expectation that is implicit in the Wikidata
data model: if you have a claim C with two references A and B, then you
could as well have claim C twice, once with reference A and once with
reference B. References therefore should never have cross-dependencies
or play different roles.
Maybe I misunderstood and you meant something else: you could of course
make a single reference and use a specific form (with only two
properties, P248 and P1310). But then you need to use single items for
each of the references. Many references on Wikidata are not expressed by
single items but by many property-value pairs (think of "reference URL +
retrieved + ..."). Such compound references would then not work in this
encoding.
What to do? In general, I think it is most important to give the
reference that explains the deprecation, not the (mistaken) one that
claims a wrong thing. This also makes sense for other reasons: if we
create statistics such as "80% of all Wikidata statements have
references" then we don't want to count deprecated statements where the
only reference given claims that the wrong thing is actually true. A
"deprecated statement with reference" should always be one where we have
a reference that supports the claim that the statement is not true
(justifies why it is deprecated). Again, you can see here how important
it is to stick to certain boundaries of interpretation when you want to
process data with tools later on.
If my suggestions somehow don't work in practice, then the best way
would be to file a feature request for having additional meta-data for
deprecated statements. Since the ranks are built into the software, any
solution that really needs to change the meaning of the software needs
to be implemented in code. Then it would be the same approach on all
future Wikibase sites and software could work with it. However, I really
hope that the reference-based approach is acceptable to the Wikidata
community in practice.
Best regards,
Markus