Hi Gerard,
I don't think your answer is related to my email or this thread (not
saying that your points may not be important in some way, but it should
really go under a different subject line). I have merely proposed to use
P2241 in references instead of using it as a qualifier. Sorry for my
long email; I can see now that the main message got somewhat buried
among details there.
It seems I should also comment on your remark about funding. You must be
referring to the WMF funding for the Wikidata Toolkit IEG project. This
has (among other things) led to the first Wikidata RDF exports and
thereby made its contribution to creating today's SPARQL service (which
would of course not have happened without the great work of Stas et al.
to get it all running!). If you expected that this projects would lead
to "improvements in or the tooling that makes use of the SPARQL engine"
then you were one or two steps ahead of all of us: nobody has even
talked about a SPARQL service when I applied, and the project had long
ended when the SPARQL service went life. Maybe you have been thinking of
some other project that is not related to my work at all?
Best regards,
Markus
On 14.08.2016 15:14, Gerard Meijssen wrote:
Hoi,
Markus it is very much a matter of perspective and we do not all see
things in the same way. For me the re-usability of Wikidata is very much
secondary. Important but secondary. The primary goal of Wikidata is to
provide a data storage for Wikimedia projects. The problem that I see is
that much effort has gone in secondary goals largely at the cost of the
primary perspective.
For an editor of Wikidata Wikidata is hardly usable. It is very much
because of tools like Reasonator that I can understand the data that is
in Wikidata. It is also for this reason that "deprecation" will evolve
away from you. It is wonderful that all these high level approaches
exist but the problem is that it does not consider the effects on people
editing Wikidata. SPARQL is now good enough to replace WDQ but the
problem is that the tools build upon WDQ are not converted and SPARQL
does not bring the easy use that I and others are accustomed to. There
is no replacement for much of the functionality.
We do agree that the architecture of Wikidata has to be stable but so
does its tooling and this is where we fail and consequently see a
divergence. In the past I asked you for tools and I supported additional
funding on the promise of support for tooling. So far I have noticed
that the quality of the engine has improved but I have not seen
improvements in or the tooling that makes use of the SPARQL engine.
For me all the attention to top level concerns have been at the cost of
supporting people who actually enter the data. I do not see a strategy
to converge Wikidata and Wikipedia editing and I have made the argument
why this is vital for our quality repeatedly.
So as you want to preserve top level integrity do consider tooling and
do consider what it is we aim for.
Thanks,
GerardM
On 14 August 2016 at 14:26, Markus Kroetzsch
<markus.kroetzsch(a)tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>>
wrote:
On 12.08.2016 17:24, Jean-Luc Léger wrote:
On 2016-08-11 22:29, Markus Kroetzsch wrote:
On 11.08.2016 18:45, Andra Waagmeester wrote:
On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch
<markus.kroetzsch(a)tu-dresden.de
<mailto:markus.kroetzsch@tu-dresden.de>
<mailto:markus.kroetzsch@tu-dresden.de
<mailto:markus.kroetzsch@tu-dresden.de>>
<mailto:markus.kroetzsch@tu-dresden.de
<mailto:markus.kroetzsch@tu-dresden.de>
<mailto:markus.kroetzsch@tu-dresden.de
<mailto:markus.kroetzsch@tu-dresden.de>>>>
wrote:
has a statement "population: 20,086 (point in time:
2011)"
that is
confirmed by a reference. Nevertheless, the statement is
marked as
"deprecated". This would mean that the statement "the
popluation was
20,086 in 2011" is wrong. As far as I can tell, this
is not
the case.
I wouldn't say that with a deprecated rank, that
statement is
"wrong". I
consider de term deprecated to indicate that a given
statement is no
longer valid in the context of a given resource
(reference). I
agree, in
this specific case the use of the deprecated rank is
wrong, since no
references are given to that specific statement.
Nevertheless, I think it is possible to have disagreeing
resources on an
identical statement, where two identical statements
exists, one with
rank "deprecated" and one with rank "normal". It is
up
to the
user to
decide which source s/he trusts.
The status "deprecated" is part of the claim of the
statement. The
reference is supposed to support this claim, which in this
case is
also the claim that it is deprecated. The status is not meant to
deprecate a reference (not saying that this is never useful,
potentially, but you can only use it in one way, and it
seems much
more practical if deprecated statements get references that
explain
why they are deprecated).
Yes. I think a complete deprecated statement should look like this :
Rank: Deprecated
Value: <some value>
Qualifier: P2241:reason for deprecation + <some reason>
References
* P248:Stated in (or any other property for a reference) --> a
reference where the value is true (explaining why we added it)
Value: <name of the reference>
+ any additional qualifiers
* P1310:statement disputed by --> a
reference explaining why the claim is deprecated
Value: <name of the reference>
+ any additional qualifiers
I am afraid that this is not a good approach, and it will lead to
problems in the future. The status "deprecated" refers to the
*complete claim, including all qualifiers*. So if you add a
qualifier P2241, it would also be part of what is "deprecated",
which is clearly not intended here. This is part of the general data
structure in Wikidata, and tools using the data would expect this to
hold true. Ranks are a built-in feature of the software, so this
aspect is not really open to interpretation.
What you are doing here is giving up part of the pre-defined
structure and replacing it by some local (site-specific) consensus.
I know that this might be a bit subtle and not so easy to see at
first, but it is a big step away from structured data that is easy
to share across applications.
For example, imagine an application wants to compare "normal"
statements with "deprecated" statements to see if there is any
apparent contradiction (the same statement being given with both
ranks). This would no longer work if you add meta-information to
deprecated statements in the form of qualifiers. For a software
tool, an additional quantifier simply changes the meaning. Imagine
that one statement has an additional "end date" qualifier that the
other one is lacking -- clearly, it would be perfectly reasonable
that the statement with the end date is deprecated while the one
that has only a start but no end is not. Technically, there is no
difference between this situation and the situation where you add a
new qualifier "P2241".
Now you could say: "Software should know the special meaning of
P2241 and treat it accordingly." But this is only working for one
site (Wikidata in this case). A future Wikibase-enabled Commons or
Wiktionary would use different properties. You end up with having to
change software for each site, and severely reducing
interoperability across sites (imagine you want to combine data from
two sites before processing it).
Even if you are only interested in a single site (Wikidata), you are
changing the way in which statements should be interpreted over
time. If the community uses qualifiers to change the data model like
this, then the current definition of these qualifiers dictates how
statements should be interpreted. Then if you want to analyse
history, things can be very difficult.
What to do? It is quite simple: P2241 clearly belongs into the
reference of a deprecated statement, not into its qualifiers. This
will retain the same information while keeping the distinction
between the claim that is deprecated (and which may have qualifiers)
and the meta-data that explains why this is the case. Indeed, giving
justification and explanation for a statement is precisely what the
references are for, so P2241 fits there
I am not so sure if the rest of your modelling can work either,
since it seems to me that you cannot in general capture two
references (the original "P248" one and the correcting "P1310"
one)
in a single reference. Giving them both as two individual references
would be a bad idea, since it would again change the meaning of the
data, since you would give two mutually contradicting references for
the same claim, and site-specific extra information would be needed
to understand what is going on.
In fact, this is another expectation that is implicit in the
Wikidata data model: if you have a claim C with two references A and
B, then you could as well have claim C twice, once with reference A
and once with reference B. References therefore should never have
cross-dependencies or play different roles.
Maybe I misunderstood and you meant something else: you could of
course make a single reference and use a specific form (with only
two properties, P248 and P1310). But then you need to use single
items for each of the references. Many references on Wikidata are
not expressed by single items but by many property-value pairs
(think of "reference URL + retrieved + ..."). Such compound
references would then not work in this encoding.
What to do? In general, I think it is most important to give the
reference that explains the deprecation, not the (mistaken) one that
claims a wrong thing. This also makes sense for other reasons: if we
create statistics such as "80% of all Wikidata statements have
references" then we don't want to count deprecated statements where
the only reference given claims that the wrong thing is actually
true. A "deprecated statement with reference" should always be one
where we have a reference that supports the claim that the statement
is not true (justifies why it is deprecated). Again, you can see
here how important it is to stick to certain boundaries of
interpretation when you want to process data with tools later on.
If my suggestions somehow don't work in practice, then the best way
would be to file a feature request for having additional meta-data
for deprecated statements. Since the ranks are built into the
software, any solution that really needs to change the meaning of
the software needs to be implemented in code. Then it would be the
same approach on all future Wikibase sites and software could work
with it. However, I really hope that the reference-based approach is
acceptable to the Wikidata community in practice.
Best regards,
Markus
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata