Re: [Wikidata] (Ab)use of "deprecated"

14 Aug 2016

Hoi,
It is even more base. It is about the understanding of the data. When
people are to understand the data and have no help, they will build
constructs in their mind and not consider at all any high level conceptual
considerations.

I do not care too much about the "conceptual heritage of the Wikidata
creators", they stand on the shoulders of giants as well. What I care about
is the purpose of Wikidata. What it is there for and I repeat myself when I
state that interoperability is secondary. The primary purpose of Wikidata
is realised when said interoperability has an effect on the data in a
qualitative way. So far it is far removed from the content in Wikidata
itself and consequently the point has not been shown except abstractly.

So far we have been on a path where the work developed elsewhere was
considered to be of a lesser value. It is why the Freebase import is a
fiasco.
Thanks,
       GerardM

On 14 August 2016 at 17:01, Thomas Douillard &lt;thomas.douillard(a)gmail.com&gt;
wrote:

...
  More seriously maybe :

 I guess you mention tool because you argue that the correct interpretation
 of datas is given by the tools and humans. I think it's totally wrong in
 this case. First because it's a (very small) minimal set of requirements
 that Wikibase (not Wikidata) was built on, and those set is present since
 the very first conceptual data model of Wikidata. It's always been public
 and present in the conceptual description and determined the help pages if
 you bother read them. It's the framework that guided our decisions more or
 less explicitely, and this is relatively well understood from our core
 community. It just need to be spread on to the more distant Wikimedia user
 circles that are less into Wikidata. This should not be a problem. Things
 are very different with properties which community is able to create,
 delete and use as it wishes.

 Another POV on this : this is one of Wikidata's pillars. The conceptual
 heritage of WIkidata creators.

 2016-08-14 15:14 GMT+02:00 Gerard Meijssen &lt;gerard.meijssen(a)gmail.com&gt;om>:

  Hoi,
 Markus it is very much a matter of perspective and we do not all see
 things in the same way. For me the re-usability of Wikidata is very much
 secondary. Important but secondary. The primary goal of Wikidata is to
 provide a data storage for Wikimedia projects. The problem that I see is
 that much effort has gone in secondary goals largely at the cost of the
 primary perspective.

 For an editor of Wikidata Wikidata is hardly usable. It is very much
 because of tools like Reasonator that I can understand the data that is in
 Wikidata. It is also for this reason that "deprecation" will evolve away
 from you. It is wonderful that all these high level approaches exist but
 the problem is that it does not consider the effects on people editing
 Wikidata. SPARQL is now good enough to replace WDQ but the problem is that
 the tools build upon WDQ are not converted and SPARQL does not bring the
 easy use that I and others are accustomed to. There is no replacement for
 much of the functionality.

 We do agree that the architecture of Wikidata has to be stable but so
 does its tooling and this is where we fail and consequently see a
 divergence. In the past I asked you for tools and I supported additional
 funding on the promise of support for tooling. So far I have noticed that
 the quality of the engine has improved but I have not seen improvements in
 or the tooling that makes use of the SPARQL engine.

 For me all the attention to top level concerns have been at the cost of
 supporting people who actually enter the data. I do not see a strategy to
 converge Wikidata and Wikipedia editing and I have made the argument why
 this is vital for our quality repeatedly.

 So as you want to preserve top level integrity do consider tooling and do
 consider what it is we aim for.
 Thanks,
        GerardM

 On 14 August 2016 at 14:26, Markus Kroetzsch <
 markus.kroetzsch(a)tu-dresden.de&gt; wrote:

  On 12.08.2016 17:24, Jean-Luc Léger wrote:

  On 2016-08-11 22:29, Markus Kroetzsch wrote:

     On 11.08.2016 18:45, Andra Waagmeester wrote:

         On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch
         &lt;markus.kroetzsch(a)tu-dresden.de
         <mailto:markus.kroetzsch@tu-dresden.de>
         <mailto:markus.kroetzsch@tu-dresden.de

         <mailto:markus.kroetzsch@tu-dresden.de>>>
         wrote:

             has a statement "population: 20,086 (point in time: 2011)"
         that is
             confirmed by a reference. Nevertheless, the statement is
         marked as
             "deprecated". This would mean that the statement "the
         popluation was
             20,086 in 2011" is wrong. As far as I can tell, this is not
         the case.

         I wouldn't say that with a deprecated rank, that statement is
         "wrong". I
         consider de term deprecated to indicate that a given statement
 is no
         longer valid in the context of a given resource (reference). I
         agree, in
         this specific case the use of the deprecated rank is wrong,
 since no
         references are given to that specific statement.
         Nevertheless, I think it is possible to have disagreeing
         resources on an
         identical statement, where two identical statements exists, one
 with
         rank "deprecated" and one with rank "normal". It is up to
the
         user to
         decide which source s/he trusts.

     The status "deprecated" is part of the claim of the statement. The
     reference is supposed to support this claim, which in this case is
     also the claim that it is deprecated. The status is not meant to
     deprecate a reference (not saying that this is never useful,
     potentially, but you can only use it in one way, and it seems much
     more practical if deprecated statements get references that explain
     why they are deprecated).

 Yes. I think a complete deprecated statement should look like this :

 Rank: Deprecated
 Value: <some value>
 Qualifier: P2241:reason for deprecation + <some reason>

 References
 * P248:Stated in (or any other property for a reference)   --> a
 reference where the value is true (explaining why we added it)
   Value: <name of the reference>
   + any additional qualifiers
 * P1310:statement disputed by                              --> a
 reference explaining why the claim is deprecated
   Value: <name of the reference>
   + any additional qualifiers

 I am afraid that this is not a good approach, and it will lead to
 problems in the future. The status "deprecated" refers to the *complete
 claim, including all qualifiers*. So if you add a qualifier P2241, it would
 also be part of what is "deprecated", which is clearly not intended here.
 This is part of the general data structure in Wikidata, and tools using the
 data would expect this to hold true. Ranks are a built-in feature of the
 software, so this aspect is not really open to interpretation.

 What you are doing here is giving up part of the pre-defined structure
 and replacing it by some local (site-specific) consensus. I know that this
 might be a bit subtle and not so easy to see at first, but it is a big step
 away from structured data that is easy to share across applications.

 For example, imagine an application wants to compare "normal" statements
 with "deprecated" statements to see if there is any apparent contradiction
 (the same statement being given with both ranks). This would no longer work
 if you add meta-information to deprecated statements in the form of
 qualifiers. For a software tool, an additional quantifier simply changes
 the meaning. Imagine that one statement has an additional "end date"
 qualifier that the other one is lacking -- clearly, it would be perfectly
 reasonable that the statement with the end date is deprecated while the one
 that has only a start but no end is not. Technically, there is no
 difference between this situation and the situation where you add a new
 qualifier "P2241".

 Now you could say: "Software should know the special meaning of P2241
 and treat it accordingly." But this is only working for one site (Wikidata
 in this case). A future Wikibase-enabled Commons or Wiktionary would use
 different properties. You end up with having to change software for each
 site, and severely reducing interoperability across sites (imagine you want
 to combine data from two sites before processing it).

 Even if you are only interested in a single site (Wikidata), you are
 changing the way in which statements should be interpreted over time. If
 the community uses qualifiers to change the data model like this, then the
 current definition of these qualifiers dictates how statements should be
 interpreted. Then if you want to analyse history, things can be very
 difficult.

 What to do? It is quite simple: P2241 clearly belongs into the reference
 of a deprecated statement, not into its qualifiers. This will retain the
 same information while keeping the distinction between the claim that is
 deprecated (and which may have qualifiers) and the meta-data that explains
 why this is the case. Indeed, giving justification and explanation for a
 statement is precisely what the references are for, so P2241 fits there

 I am not so sure if the rest of your modelling can work either, since it
 seems to me that you cannot in general capture two references (the original
 "P248" one and the correcting "P1310" one) in a single reference.
Giving
 them both as two individual references would be a bad idea, since it would
 again change the meaning of the data, since you would give two mutually
 contradicting references for the same claim, and site-specific extra
 information would be needed to understand what is going on.

 In fact, this is another expectation that is implicit in the Wikidata
 data model: if you have a claim C with two references A and B, then you
 could as well have claim C twice, once with reference A and once with
 reference B. References therefore should never have cross-dependencies or
 play different roles.

 Maybe I misunderstood and you meant something else: you could of course
 make a single reference and use a specific form (with only two properties,
 P248 and P1310). But then you need to use single items for each of the
 references. Many references on Wikidata are not expressed by single items
 but by many property-value pairs (think of "reference URL + retrieved +
 ..."). Such compound references would then not work in this encoding.

 What to do? In general, I think it is most important to give the
 reference that explains the deprecation, not the (mistaken) one that claims
 a wrong thing. This also makes sense for other reasons: if we create
 statistics such as "80% of all Wikidata statements have references" then we
 don't want to count deprecated statements where the only reference given
 claims that the wrong thing is actually true. A "deprecated statement with
 reference" should always be one where we have a reference that supports the
 claim that the statement is not true (justifies why it is deprecated).
 Again, you can see here how important it is to stick to certain boundaries
 of interpretation when you want to process data with tools later on.

 If my suggestions somehow don't work in practice, then the best way
 would be to file a feature request for having additional meta-data for
 deprecated statements. Since the ranks are built into the software, any
 solution that really needs to change the meaning of the software needs to
 be implemented in code. Then it would be the same approach on all future
 Wikibase sites and software could work with it. However, I really hope that
 the reference-based approach is acceptable to the Wikidata community in
 practice.

 Best regards,

 Markus

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] (Ab)use of "deprecated"