On 2016-08-11 22:29, Markus Kroetzsch wrote:
On 11.08.2016 18:45, Andra Waagmeester wrote:
On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de mailto:markus.kroetzsch@tu-dresden.de> wrote:
has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case.
I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given statement is no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts.
The status "deprecated" is part of the claim of the statement. The reference is supposed to support this claim, which in this case is also the claim that it is deprecated. The status is not meant to deprecate a reference (not saying that this is never useful, potentially, but you can only use it in one way, and it seems much more practical if deprecated statements get references that explain why they are deprecated).
Yes. I think a complete deprecated statement should look like this :
Rank: Deprecated Value: <some value> Qualifier: P2241:reason for deprecation + <some reason>
References * P248:Stated in (or any other property for a reference) --> a reference where the value is true (explaining why we added it) Value: <name of the reference> + any additional qualifiers * P1310:statement disputed by --> a reference explaining why the claim is deprecated Value: <name of the reference> + any additional qualifiers
JL aka Melderick
On 12.08.2016 17:24, Jean-Luc Léger wrote:
On 2016-08-11 22:29, Markus Kroetzsch wrote:
On 11.08.2016 18:45, Andra Waagmeester wrote: On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de> <mailto:markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>>> wrote: has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case. I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given statement is no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts. The status "deprecated" is part of the claim of the statement. The reference is supposed to support this claim, which in this case is also the claim that it is deprecated. The status is not meant to deprecate a reference (not saying that this is never useful, potentially, but you can only use it in one way, and it seems much more practical if deprecated statements get references that explain why they are deprecated).
Yes. I think a complete deprecated statement should look like this :
Rank: Deprecated Value: <some value> Qualifier: P2241:reason for deprecation + <some reason>
References
- P248:Stated in (or any other property for a reference) --> a
reference where the value is true (explaining why we added it) Value: <name of the reference>
- any additional qualifiers
- P1310:statement disputed by --> a
reference explaining why the claim is deprecated Value: <name of the reference>
- any additional qualifiers
I am afraid that this is not a good approach, and it will lead to problems in the future. The status "deprecated" refers to the *complete claim, including all qualifiers*. So if you add a qualifier P2241, it would also be part of what is "deprecated", which is clearly not intended here. This is part of the general data structure in Wikidata, and tools using the data would expect this to hold true. Ranks are a built-in feature of the software, so this aspect is not really open to interpretation.
What you are doing here is giving up part of the pre-defined structure and replacing it by some local (site-specific) consensus. I know that this might be a bit subtle and not so easy to see at first, but it is a big step away from structured data that is easy to share across applications.
For example, imagine an application wants to compare "normal" statements with "deprecated" statements to see if there is any apparent contradiction (the same statement being given with both ranks). This would no longer work if you add meta-information to deprecated statements in the form of qualifiers. For a software tool, an additional quantifier simply changes the meaning. Imagine that one statement has an additional "end date" qualifier that the other one is lacking -- clearly, it would be perfectly reasonable that the statement with the end date is deprecated while the one that has only a start but no end is not. Technically, there is no difference between this situation and the situation where you add a new qualifier "P2241".
Now you could say: "Software should know the special meaning of P2241 and treat it accordingly." But this is only working for one site (Wikidata in this case). A future Wikibase-enabled Commons or Wiktionary would use different properties. You end up with having to change software for each site, and severely reducing interoperability across sites (imagine you want to combine data from two sites before processing it).
Even if you are only interested in a single site (Wikidata), you are changing the way in which statements should be interpreted over time. If the community uses qualifiers to change the data model like this, then the current definition of these qualifiers dictates how statements should be interpreted. Then if you want to analyse history, things can be very difficult.
What to do? It is quite simple: P2241 clearly belongs into the reference of a deprecated statement, not into its qualifiers. This will retain the same information while keeping the distinction between the claim that is deprecated (and which may have qualifiers) and the meta-data that explains why this is the case. Indeed, giving justification and explanation for a statement is precisely what the references are for, so P2241 fits there
I am not so sure if the rest of your modelling can work either, since it seems to me that you cannot in general capture two references (the original "P248" one and the correcting "P1310" one) in a single reference. Giving them both as two individual references would be a bad idea, since it would again change the meaning of the data, since you would give two mutually contradicting references for the same claim, and site-specific extra information would be needed to understand what is going on.
In fact, this is another expectation that is implicit in the Wikidata data model: if you have a claim C with two references A and B, then you could as well have claim C twice, once with reference A and once with reference B. References therefore should never have cross-dependencies or play different roles.
Maybe I misunderstood and you meant something else: you could of course make a single reference and use a specific form (with only two properties, P248 and P1310). But then you need to use single items for each of the references. Many references on Wikidata are not expressed by single items but by many property-value pairs (think of "reference URL + retrieved + ..."). Such compound references would then not work in this encoding.
What to do? In general, I think it is most important to give the reference that explains the deprecation, not the (mistaken) one that claims a wrong thing. This also makes sense for other reasons: if we create statistics such as "80% of all Wikidata statements have references" then we don't want to count deprecated statements where the only reference given claims that the wrong thing is actually true. A "deprecated statement with reference" should always be one where we have a reference that supports the claim that the statement is not true (justifies why it is deprecated). Again, you can see here how important it is to stick to certain boundaries of interpretation when you want to process data with tools later on.
If my suggestions somehow don't work in practice, then the best way would be to file a feature request for having additional meta-data for deprecated statements. Since the ranks are built into the software, any solution that really needs to change the meaning of the software needs to be implemented in code. Then it would be the same approach on all future Wikibase sites and software could work with it. However, I really hope that the reference-based approach is acceptable to the Wikidata community in practice.
Best regards,
Markus
Hoi, Markus it is very much a matter of perspective and we do not all see things in the same way. For me the re-usability of Wikidata is very much secondary. Important but secondary. The primary goal of Wikidata is to provide a data storage for Wikimedia projects. The problem that I see is that much effort has gone in secondary goals largely at the cost of the primary perspective.
For an editor of Wikidata Wikidata is hardly usable. It is very much because of tools like Reasonator that I can understand the data that is in Wikidata. It is also for this reason that "deprecation" will evolve away from you. It is wonderful that all these high level approaches exist but the problem is that it does not consider the effects on people editing Wikidata. SPARQL is now good enough to replace WDQ but the problem is that the tools build upon WDQ are not converted and SPARQL does not bring the easy use that I and others are accustomed to. There is no replacement for much of the functionality.
We do agree that the architecture of Wikidata has to be stable but so does its tooling and this is where we fail and consequently see a divergence. In the past I asked you for tools and I supported additional funding on the promise of support for tooling. So far I have noticed that the quality of the engine has improved but I have not seen improvements in or the tooling that makes use of the SPARQL engine.
For me all the attention to top level concerns have been at the cost of supporting people who actually enter the data. I do not see a strategy to converge Wikidata and Wikipedia editing and I have made the argument why this is vital for our quality repeatedly.
So as you want to preserve top level integrity do consider tooling and do consider what it is we aim for. Thanks, GerardM
On 14 August 2016 at 14:26, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de
wrote:
On 12.08.2016 17:24, Jean-Luc Léger wrote:
On 2016-08-11 22:29, Markus Kroetzsch wrote:
On 11.08.2016 18:45, Andra Waagmeester wrote: On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de> <mailto:markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>>> wrote: has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case. I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given statement is
no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts.
The status "deprecated" is part of the claim of the statement. The reference is supposed to support this claim, which in this case is also the claim that it is deprecated. The status is not meant to deprecate a reference (not saying that this is never useful, potentially, but you can only use it in one way, and it seems much more practical if deprecated statements get references that explain why they are deprecated).
Yes. I think a complete deprecated statement should look like this :
Rank: Deprecated Value: <some value> Qualifier: P2241:reason for deprecation + <some reason>
References
- P248:Stated in (or any other property for a reference) --> a
reference where the value is true (explaining why we added it) Value: <name of the reference>
- any additional qualifiers
- P1310:statement disputed by --> a
reference explaining why the claim is deprecated Value: <name of the reference>
- any additional qualifiers
I am afraid that this is not a good approach, and it will lead to problems in the future. The status "deprecated" refers to the *complete claim, including all qualifiers*. So if you add a qualifier P2241, it would also be part of what is "deprecated", which is clearly not intended here. This is part of the general data structure in Wikidata, and tools using the data would expect this to hold true. Ranks are a built-in feature of the software, so this aspect is not really open to interpretation.
What you are doing here is giving up part of the pre-defined structure and replacing it by some local (site-specific) consensus. I know that this might be a bit subtle and not so easy to see at first, but it is a big step away from structured data that is easy to share across applications.
For example, imagine an application wants to compare "normal" statements with "deprecated" statements to see if there is any apparent contradiction (the same statement being given with both ranks). This would no longer work if you add meta-information to deprecated statements in the form of qualifiers. For a software tool, an additional quantifier simply changes the meaning. Imagine that one statement has an additional "end date" qualifier that the other one is lacking -- clearly, it would be perfectly reasonable that the statement with the end date is deprecated while the one that has only a start but no end is not. Technically, there is no difference between this situation and the situation where you add a new qualifier "P2241".
Now you could say: "Software should know the special meaning of P2241 and treat it accordingly." But this is only working for one site (Wikidata in this case). A future Wikibase-enabled Commons or Wiktionary would use different properties. You end up with having to change software for each site, and severely reducing interoperability across sites (imagine you want to combine data from two sites before processing it).
Even if you are only interested in a single site (Wikidata), you are changing the way in which statements should be interpreted over time. If the community uses qualifiers to change the data model like this, then the current definition of these qualifiers dictates how statements should be interpreted. Then if you want to analyse history, things can be very difficult.
What to do? It is quite simple: P2241 clearly belongs into the reference of a deprecated statement, not into its qualifiers. This will retain the same information while keeping the distinction between the claim that is deprecated (and which may have qualifiers) and the meta-data that explains why this is the case. Indeed, giving justification and explanation for a statement is precisely what the references are for, so P2241 fits there
I am not so sure if the rest of your modelling can work either, since it seems to me that you cannot in general capture two references (the original "P248" one and the correcting "P1310" one) in a single reference. Giving them both as two individual references would be a bad idea, since it would again change the meaning of the data, since you would give two mutually contradicting references for the same claim, and site-specific extra information would be needed to understand what is going on.
In fact, this is another expectation that is implicit in the Wikidata data model: if you have a claim C with two references A and B, then you could as well have claim C twice, once with reference A and once with reference B. References therefore should never have cross-dependencies or play different roles.
Maybe I misunderstood and you meant something else: you could of course make a single reference and use a specific form (with only two properties, P248 and P1310). But then you need to use single items for each of the references. Many references on Wikidata are not expressed by single items but by many property-value pairs (think of "reference URL + retrieved + ..."). Such compound references would then not work in this encoding.
What to do? In general, I think it is most important to give the reference that explains the deprecation, not the (mistaken) one that claims a wrong thing. This also makes sense for other reasons: if we create statistics such as "80% of all Wikidata statements have references" then we don't want to count deprecated statements where the only reference given claims that the wrong thing is actually true. A "deprecated statement with reference" should always be one where we have a reference that supports the claim that the statement is not true (justifies why it is deprecated). Again, you can see here how important it is to stick to certain boundaries of interpretation when you want to process data with tools later on.
If my suggestions somehow don't work in practice, then the best way would be to file a feature request for having additional meta-data for deprecated statements. Since the ranks are built into the software, any solution that really needs to change the meaning of the software needs to be implemented in code. Then it would be the same approach on all future Wikibase sites and software could work with it. However, I really hope that the reference-based approach is acceptable to the Wikidata community in practice.
Best regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Sorry but I hardly see how this answer could come up into current discussion. Please start another thread ;)
2016-08-14 15:14 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, Markus it is very much a matter of perspective and we do not all see things in the same way. For me the re-usability of Wikidata is very much secondary. Important but secondary. The primary goal of Wikidata is to provide a data storage for Wikimedia projects. The problem that I see is that much effort has gone in secondary goals largely at the cost of the primary perspective.
For an editor of Wikidata Wikidata is hardly usable. It is very much because of tools like Reasonator that I can understand the data that is in Wikidata. It is also for this reason that "deprecation" will evolve away from you. It is wonderful that all these high level approaches exist but the problem is that it does not consider the effects on people editing Wikidata. SPARQL is now good enough to replace WDQ but the problem is that the tools build upon WDQ are not converted and SPARQL does not bring the easy use that I and others are accustomed to. There is no replacement for much of the functionality.
We do agree that the architecture of Wikidata has to be stable but so does its tooling and this is where we fail and consequently see a divergence. In the past I asked you for tools and I supported additional funding on the promise of support for tooling. So far I have noticed that the quality of the engine has improved but I have not seen improvements in or the tooling that makes use of the SPARQL engine.
For me all the attention to top level concerns have been at the cost of supporting people who actually enter the data. I do not see a strategy to converge Wikidata and Wikipedia editing and I have made the argument why this is vital for our quality repeatedly.
So as you want to preserve top level integrity do consider tooling and do consider what it is we aim for. Thanks, GerardM
On 14 August 2016 at 14:26, Markus Kroetzsch <markus.kroetzsch@tu-dresden. de> wrote:
On 12.08.2016 17:24, Jean-Luc Léger wrote:
On 2016-08-11 22:29, Markus Kroetzsch wrote:
On 11.08.2016 18:45, Andra Waagmeester wrote: On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de> <mailto:markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>>> wrote: has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case. I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given statement
is no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts.
The status "deprecated" is part of the claim of the statement. The reference is supposed to support this claim, which in this case is also the claim that it is deprecated. The status is not meant to deprecate a reference (not saying that this is never useful, potentially, but you can only use it in one way, and it seems much more practical if deprecated statements get references that explain why they are deprecated).
Yes. I think a complete deprecated statement should look like this :
Rank: Deprecated Value: <some value> Qualifier: P2241:reason for deprecation + <some reason>
References
- P248:Stated in (or any other property for a reference) --> a
reference where the value is true (explaining why we added it) Value: <name of the reference>
- any additional qualifiers
- P1310:statement disputed by --> a
reference explaining why the claim is deprecated Value: <name of the reference>
- any additional qualifiers
I am afraid that this is not a good approach, and it will lead to problems in the future. The status "deprecated" refers to the *complete claim, including all qualifiers*. So if you add a qualifier P2241, it would also be part of what is "deprecated", which is clearly not intended here. This is part of the general data structure in Wikidata, and tools using the data would expect this to hold true. Ranks are a built-in feature of the software, so this aspect is not really open to interpretation.
What you are doing here is giving up part of the pre-defined structure and replacing it by some local (site-specific) consensus. I know that this might be a bit subtle and not so easy to see at first, but it is a big step away from structured data that is easy to share across applications.
For example, imagine an application wants to compare "normal" statements with "deprecated" statements to see if there is any apparent contradiction (the same statement being given with both ranks). This would no longer work if you add meta-information to deprecated statements in the form of qualifiers. For a software tool, an additional quantifier simply changes the meaning. Imagine that one statement has an additional "end date" qualifier that the other one is lacking -- clearly, it would be perfectly reasonable that the statement with the end date is deprecated while the one that has only a start but no end is not. Technically, there is no difference between this situation and the situation where you add a new qualifier "P2241".
Now you could say: "Software should know the special meaning of P2241 and treat it accordingly." But this is only working for one site (Wikidata in this case). A future Wikibase-enabled Commons or Wiktionary would use different properties. You end up with having to change software for each site, and severely reducing interoperability across sites (imagine you want to combine data from two sites before processing it).
Even if you are only interested in a single site (Wikidata), you are changing the way in which statements should be interpreted over time. If the community uses qualifiers to change the data model like this, then the current definition of these qualifiers dictates how statements should be interpreted. Then if you want to analyse history, things can be very difficult.
What to do? It is quite simple: P2241 clearly belongs into the reference of a deprecated statement, not into its qualifiers. This will retain the same information while keeping the distinction between the claim that is deprecated (and which may have qualifiers) and the meta-data that explains why this is the case. Indeed, giving justification and explanation for a statement is precisely what the references are for, so P2241 fits there
I am not so sure if the rest of your modelling can work either, since it seems to me that you cannot in general capture two references (the original "P248" one and the correcting "P1310" one) in a single reference. Giving them both as two individual references would be a bad idea, since it would again change the meaning of the data, since you would give two mutually contradicting references for the same claim, and site-specific extra information would be needed to understand what is going on.
In fact, this is another expectation that is implicit in the Wikidata data model: if you have a claim C with two references A and B, then you could as well have claim C twice, once with reference A and once with reference B. References therefore should never have cross-dependencies or play different roles.
Maybe I misunderstood and you meant something else: you could of course make a single reference and use a specific form (with only two properties, P248 and P1310). But then you need to use single items for each of the references. Many references on Wikidata are not expressed by single items but by many property-value pairs (think of "reference URL + retrieved + ..."). Such compound references would then not work in this encoding.
What to do? In general, I think it is most important to give the reference that explains the deprecation, not the (mistaken) one that claims a wrong thing. This also makes sense for other reasons: if we create statistics such as "80% of all Wikidata statements have references" then we don't want to count deprecated statements where the only reference given claims that the wrong thing is actually true. A "deprecated statement with reference" should always be one where we have a reference that supports the claim that the statement is not true (justifies why it is deprecated). Again, you can see here how important it is to stick to certain boundaries of interpretation when you want to process data with tools later on.
If my suggestions somehow don't work in practice, then the best way would be to file a feature request for having additional meta-data for deprecated statements. Since the ranks are built into the software, any solution that really needs to change the meaning of the software needs to be implemented in code. Then it would be the same approach on all future Wikibase sites and software could work with it. However, I really hope that the reference-based approach is acceptable to the Wikidata community in practice.
Best regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
More seriously maybe :
I guess you mention tool because you argue that the correct interpretation of datas is given by the tools and humans. I think it's totally wrong in this case. First because it's a (very small) minimal set of requirements that Wikibase (not Wikidata) was built on, and those set is present since the very first conceptual data model of Wikidata. It's always been public and present in the conceptual description and determined the help pages if you bother read them. It's the framework that guided our decisions more or less explicitely, and this is relatively well understood from our core community. It just need to be spread on to the more distant Wikimedia user circles that are less into Wikidata. This should not be a problem. Things are very different with properties which community is able to create, delete and use as it wishes.
Another POV on this : this is one of Wikidata's pillars. The conceptual heritage of WIkidata creators.
2016-08-14 15:14 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, Markus it is very much a matter of perspective and we do not all see things in the same way. For me the re-usability of Wikidata is very much secondary. Important but secondary. The primary goal of Wikidata is to provide a data storage for Wikimedia projects. The problem that I see is that much effort has gone in secondary goals largely at the cost of the primary perspective.
For an editor of Wikidata Wikidata is hardly usable. It is very much because of tools like Reasonator that I can understand the data that is in Wikidata. It is also for this reason that "deprecation" will evolve away from you. It is wonderful that all these high level approaches exist but the problem is that it does not consider the effects on people editing Wikidata. SPARQL is now good enough to replace WDQ but the problem is that the tools build upon WDQ are not converted and SPARQL does not bring the easy use that I and others are accustomed to. There is no replacement for much of the functionality.
We do agree that the architecture of Wikidata has to be stable but so does its tooling and this is where we fail and consequently see a divergence. In the past I asked you for tools and I supported additional funding on the promise of support for tooling. So far I have noticed that the quality of the engine has improved but I have not seen improvements in or the tooling that makes use of the SPARQL engine.
For me all the attention to top level concerns have been at the cost of supporting people who actually enter the data. I do not see a strategy to converge Wikidata and Wikipedia editing and I have made the argument why this is vital for our quality repeatedly.
So as you want to preserve top level integrity do consider tooling and do consider what it is we aim for. Thanks, GerardM
On 14 August 2016 at 14:26, Markus Kroetzsch <markus.kroetzsch@tu-dresden. de> wrote:
On 12.08.2016 17:24, Jean-Luc Léger wrote:
On 2016-08-11 22:29, Markus Kroetzsch wrote:
On 11.08.2016 18:45, Andra Waagmeester wrote: On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de> <mailto:markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>>> wrote: has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case. I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given statement
is no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts.
The status "deprecated" is part of the claim of the statement. The reference is supposed to support this claim, which in this case is also the claim that it is deprecated. The status is not meant to deprecate a reference (not saying that this is never useful, potentially, but you can only use it in one way, and it seems much more practical if deprecated statements get references that explain why they are deprecated).
Yes. I think a complete deprecated statement should look like this :
Rank: Deprecated Value: <some value> Qualifier: P2241:reason for deprecation + <some reason>
References
- P248:Stated in (or any other property for a reference) --> a
reference where the value is true (explaining why we added it) Value: <name of the reference>
- any additional qualifiers
- P1310:statement disputed by --> a
reference explaining why the claim is deprecated Value: <name of the reference>
- any additional qualifiers
I am afraid that this is not a good approach, and it will lead to problems in the future. The status "deprecated" refers to the *complete claim, including all qualifiers*. So if you add a qualifier P2241, it would also be part of what is "deprecated", which is clearly not intended here. This is part of the general data structure in Wikidata, and tools using the data would expect this to hold true. Ranks are a built-in feature of the software, so this aspect is not really open to interpretation.
What you are doing here is giving up part of the pre-defined structure and replacing it by some local (site-specific) consensus. I know that this might be a bit subtle and not so easy to see at first, but it is a big step away from structured data that is easy to share across applications.
For example, imagine an application wants to compare "normal" statements with "deprecated" statements to see if there is any apparent contradiction (the same statement being given with both ranks). This would no longer work if you add meta-information to deprecated statements in the form of qualifiers. For a software tool, an additional quantifier simply changes the meaning. Imagine that one statement has an additional "end date" qualifier that the other one is lacking -- clearly, it would be perfectly reasonable that the statement with the end date is deprecated while the one that has only a start but no end is not. Technically, there is no difference between this situation and the situation where you add a new qualifier "P2241".
Now you could say: "Software should know the special meaning of P2241 and treat it accordingly." But this is only working for one site (Wikidata in this case). A future Wikibase-enabled Commons or Wiktionary would use different properties. You end up with having to change software for each site, and severely reducing interoperability across sites (imagine you want to combine data from two sites before processing it).
Even if you are only interested in a single site (Wikidata), you are changing the way in which statements should be interpreted over time. If the community uses qualifiers to change the data model like this, then the current definition of these qualifiers dictates how statements should be interpreted. Then if you want to analyse history, things can be very difficult.
What to do? It is quite simple: P2241 clearly belongs into the reference of a deprecated statement, not into its qualifiers. This will retain the same information while keeping the distinction between the claim that is deprecated (and which may have qualifiers) and the meta-data that explains why this is the case. Indeed, giving justification and explanation for a statement is precisely what the references are for, so P2241 fits there
I am not so sure if the rest of your modelling can work either, since it seems to me that you cannot in general capture two references (the original "P248" one and the correcting "P1310" one) in a single reference. Giving them both as two individual references would be a bad idea, since it would again change the meaning of the data, since you would give two mutually contradicting references for the same claim, and site-specific extra information would be needed to understand what is going on.
In fact, this is another expectation that is implicit in the Wikidata data model: if you have a claim C with two references A and B, then you could as well have claim C twice, once with reference A and once with reference B. References therefore should never have cross-dependencies or play different roles.
Maybe I misunderstood and you meant something else: you could of course make a single reference and use a specific form (with only two properties, P248 and P1310). But then you need to use single items for each of the references. Many references on Wikidata are not expressed by single items but by many property-value pairs (think of "reference URL + retrieved + ..."). Such compound references would then not work in this encoding.
What to do? In general, I think it is most important to give the reference that explains the deprecation, not the (mistaken) one that claims a wrong thing. This also makes sense for other reasons: if we create statistics such as "80% of all Wikidata statements have references" then we don't want to count deprecated statements where the only reference given claims that the wrong thing is actually true. A "deprecated statement with reference" should always be one where we have a reference that supports the claim that the statement is not true (justifies why it is deprecated). Again, you can see here how important it is to stick to certain boundaries of interpretation when you want to process data with tools later on.
If my suggestions somehow don't work in practice, then the best way would be to file a feature request for having additional meta-data for deprecated statements. Since the ranks are built into the software, any solution that really needs to change the meaning of the software needs to be implemented in code. Then it would be the same approach on all future Wikibase sites and software could work with it. However, I really hope that the reference-based approach is acceptable to the Wikidata community in practice.
Best regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, It is even more base. It is about the understanding of the data. When people are to understand the data and have no help, they will build constructs in their mind and not consider at all any high level conceptual considerations.
I do not care too much about the "conceptual heritage of the Wikidata creators", they stand on the shoulders of giants as well. What I care about is the purpose of Wikidata. What it is there for and I repeat myself when I state that interoperability is secondary. The primary purpose of Wikidata is realised when said interoperability has an effect on the data in a qualitative way. So far it is far removed from the content in Wikidata itself and consequently the point has not been shown except abstractly.
So far we have been on a path where the work developed elsewhere was considered to be of a lesser value. It is why the Freebase import is a fiasco. Thanks, GerardM
On 14 August 2016 at 17:01, Thomas Douillard thomas.douillard@gmail.com wrote:
More seriously maybe :
I guess you mention tool because you argue that the correct interpretation of datas is given by the tools and humans. I think it's totally wrong in this case. First because it's a (very small) minimal set of requirements that Wikibase (not Wikidata) was built on, and those set is present since the very first conceptual data model of Wikidata. It's always been public and present in the conceptual description and determined the help pages if you bother read them. It's the framework that guided our decisions more or less explicitely, and this is relatively well understood from our core community. It just need to be spread on to the more distant Wikimedia user circles that are less into Wikidata. This should not be a problem. Things are very different with properties which community is able to create, delete and use as it wishes.
Another POV on this : this is one of Wikidata's pillars. The conceptual heritage of WIkidata creators.
2016-08-14 15:14 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, Markus it is very much a matter of perspective and we do not all see things in the same way. For me the re-usability of Wikidata is very much secondary. Important but secondary. The primary goal of Wikidata is to provide a data storage for Wikimedia projects. The problem that I see is that much effort has gone in secondary goals largely at the cost of the primary perspective.
For an editor of Wikidata Wikidata is hardly usable. It is very much because of tools like Reasonator that I can understand the data that is in Wikidata. It is also for this reason that "deprecation" will evolve away from you. It is wonderful that all these high level approaches exist but the problem is that it does not consider the effects on people editing Wikidata. SPARQL is now good enough to replace WDQ but the problem is that the tools build upon WDQ are not converted and SPARQL does not bring the easy use that I and others are accustomed to. There is no replacement for much of the functionality.
We do agree that the architecture of Wikidata has to be stable but so does its tooling and this is where we fail and consequently see a divergence. In the past I asked you for tools and I supported additional funding on the promise of support for tooling. So far I have noticed that the quality of the engine has improved but I have not seen improvements in or the tooling that makes use of the SPARQL engine.
For me all the attention to top level concerns have been at the cost of supporting people who actually enter the data. I do not see a strategy to converge Wikidata and Wikipedia editing and I have made the argument why this is vital for our quality repeatedly.
So as you want to preserve top level integrity do consider tooling and do consider what it is we aim for. Thanks, GerardM
On 14 August 2016 at 14:26, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
On 12.08.2016 17:24, Jean-Luc Léger wrote:
On 2016-08-11 22:29, Markus Kroetzsch wrote:
On 11.08.2016 18:45, Andra Waagmeester wrote: On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de> <mailto:markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>>> wrote: has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case. I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given statement
is no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts.
The status "deprecated" is part of the claim of the statement. The reference is supposed to support this claim, which in this case is also the claim that it is deprecated. The status is not meant to deprecate a reference (not saying that this is never useful, potentially, but you can only use it in one way, and it seems much more practical if deprecated statements get references that explain why they are deprecated).
Yes. I think a complete deprecated statement should look like this :
Rank: Deprecated Value: <some value> Qualifier: P2241:reason for deprecation + <some reason>
References
- P248:Stated in (or any other property for a reference) --> a
reference where the value is true (explaining why we added it) Value: <name of the reference>
- any additional qualifiers
- P1310:statement disputed by --> a
reference explaining why the claim is deprecated Value: <name of the reference>
- any additional qualifiers
I am afraid that this is not a good approach, and it will lead to problems in the future. The status "deprecated" refers to the *complete claim, including all qualifiers*. So if you add a qualifier P2241, it would also be part of what is "deprecated", which is clearly not intended here. This is part of the general data structure in Wikidata, and tools using the data would expect this to hold true. Ranks are a built-in feature of the software, so this aspect is not really open to interpretation.
What you are doing here is giving up part of the pre-defined structure and replacing it by some local (site-specific) consensus. I know that this might be a bit subtle and not so easy to see at first, but it is a big step away from structured data that is easy to share across applications.
For example, imagine an application wants to compare "normal" statements with "deprecated" statements to see if there is any apparent contradiction (the same statement being given with both ranks). This would no longer work if you add meta-information to deprecated statements in the form of qualifiers. For a software tool, an additional quantifier simply changes the meaning. Imagine that one statement has an additional "end date" qualifier that the other one is lacking -- clearly, it would be perfectly reasonable that the statement with the end date is deprecated while the one that has only a start but no end is not. Technically, there is no difference between this situation and the situation where you add a new qualifier "P2241".
Now you could say: "Software should know the special meaning of P2241 and treat it accordingly." But this is only working for one site (Wikidata in this case). A future Wikibase-enabled Commons or Wiktionary would use different properties. You end up with having to change software for each site, and severely reducing interoperability across sites (imagine you want to combine data from two sites before processing it).
Even if you are only interested in a single site (Wikidata), you are changing the way in which statements should be interpreted over time. If the community uses qualifiers to change the data model like this, then the current definition of these qualifiers dictates how statements should be interpreted. Then if you want to analyse history, things can be very difficult.
What to do? It is quite simple: P2241 clearly belongs into the reference of a deprecated statement, not into its qualifiers. This will retain the same information while keeping the distinction between the claim that is deprecated (and which may have qualifiers) and the meta-data that explains why this is the case. Indeed, giving justification and explanation for a statement is precisely what the references are for, so P2241 fits there
I am not so sure if the rest of your modelling can work either, since it seems to me that you cannot in general capture two references (the original "P248" one and the correcting "P1310" one) in a single reference. Giving them both as two individual references would be a bad idea, since it would again change the meaning of the data, since you would give two mutually contradicting references for the same claim, and site-specific extra information would be needed to understand what is going on.
In fact, this is another expectation that is implicit in the Wikidata data model: if you have a claim C with two references A and B, then you could as well have claim C twice, once with reference A and once with reference B. References therefore should never have cross-dependencies or play different roles.
Maybe I misunderstood and you meant something else: you could of course make a single reference and use a specific form (with only two properties, P248 and P1310). But then you need to use single items for each of the references. Many references on Wikidata are not expressed by single items but by many property-value pairs (think of "reference URL + retrieved + ..."). Such compound references would then not work in this encoding.
What to do? In general, I think it is most important to give the reference that explains the deprecation, not the (mistaken) one that claims a wrong thing. This also makes sense for other reasons: if we create statistics such as "80% of all Wikidata statements have references" then we don't want to count deprecated statements where the only reference given claims that the wrong thing is actually true. A "deprecated statement with reference" should always be one where we have a reference that supports the claim that the statement is not true (justifies why it is deprecated). Again, you can see here how important it is to stick to certain boundaries of interpretation when you want to process data with tools later on.
If my suggestions somehow don't work in practice, then the best way would be to file a feature request for having additional meta-data for deprecated statements. Since the ranks are built into the software, any solution that really needs to change the meaning of the software needs to be implemented in code. Then it would be the same approach on all future Wikibase sites and software could work with it. However, I really hope that the reference-based approach is acceptable to the Wikidata community in practice.
Best regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
People may build representation in their mind, but first there is no guarantee that they all have the same representations which will lead to endless conflicts on incompatibilities. But mostly : there already is a reference interpretation.
This is not high level, this is very very low level and has an impact everywhere on Wikidata, basically. This should not be conflictual as we have a reference interpretation, a Markus and a Denny who should be authoritative on the question, and we can explain and point to pointers such as this thread. This is just the time to highlight this as community begins to ask itself about such questions - this might not be the case earlier as community used less the data and was not really seriously considering all this.
This is probably not to late either, as if wikidata is beginning to be used we can probably roll back without to much trouble. Consider the problems we could face if german do not share the same interpretation as french and as a result the infoboxes displays wrong datas ... We need such agreement, and we have strong arguments to show people the right path.
2016-08-14 17:25 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, It is even more base. It is about the understanding of the data. When people are to understand the data and have no help, they will build constructs in their mind and not consider at all any high level conceptual considerations.
I do not care too much about the "conceptual heritage of the Wikidata creators", they stand on the shoulders of giants as well. What I care about is the purpose of Wikidata. What it is there for and I repeat myself when I state that interoperability is secondary. The primary purpose of Wikidata is realised when said interoperability has an effect on the data in a qualitative way. So far it is far removed from the content in Wikidata itself and consequently the point has not been shown except abstractly.
So far we have been on a path where the work developed elsewhere was considered to be of a lesser value. It is why the Freebase import is a fiasco. Thanks, GerardM
On 14 August 2016 at 17:01, Thomas Douillard thomas.douillard@gmail.com wrote:
More seriously maybe :
I guess you mention tool because you argue that the correct interpretation of datas is given by the tools and humans. I think it's totally wrong in this case. First because it's a (very small) minimal set of requirements that Wikibase (not Wikidata) was built on, and those set is present since the very first conceptual data model of Wikidata. It's always been public and present in the conceptual description and determined the help pages if you bother read them. It's the framework that guided our decisions more or less explicitely, and this is relatively well understood from our core community. It just need to be spread on to the more distant Wikimedia user circles that are less into Wikidata. This should not be a problem. Things are very different with properties which community is able to create, delete and use as it wishes.
Another POV on this : this is one of Wikidata's pillars. The conceptual heritage of WIkidata creators.
2016-08-14 15:14 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, Markus it is very much a matter of perspective and we do not all see things in the same way. For me the re-usability of Wikidata is very much secondary. Important but secondary. The primary goal of Wikidata is to provide a data storage for Wikimedia projects. The problem that I see is that much effort has gone in secondary goals largely at the cost of the primary perspective.
For an editor of Wikidata Wikidata is hardly usable. It is very much because of tools like Reasonator that I can understand the data that is in Wikidata. It is also for this reason that "deprecation" will evolve away from you. It is wonderful that all these high level approaches exist but the problem is that it does not consider the effects on people editing Wikidata. SPARQL is now good enough to replace WDQ but the problem is that the tools build upon WDQ are not converted and SPARQL does not bring the easy use that I and others are accustomed to. There is no replacement for much of the functionality.
We do agree that the architecture of Wikidata has to be stable but so does its tooling and this is where we fail and consequently see a divergence. In the past I asked you for tools and I supported additional funding on the promise of support for tooling. So far I have noticed that the quality of the engine has improved but I have not seen improvements in or the tooling that makes use of the SPARQL engine.
For me all the attention to top level concerns have been at the cost of supporting people who actually enter the data. I do not see a strategy to converge Wikidata and Wikipedia editing and I have made the argument why this is vital for our quality repeatedly.
So as you want to preserve top level integrity do consider tooling and do consider what it is we aim for. Thanks, GerardM
On 14 August 2016 at 14:26, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
On 12.08.2016 17:24, Jean-Luc Léger wrote:
On 2016-08-11 22:29, Markus Kroetzsch wrote:
On 11.08.2016 18:45, Andra Waagmeester wrote: On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de> <mailto:markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>>> wrote: has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case. I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given statement
is no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts.
The status "deprecated" is part of the claim of the statement. The reference is supposed to support this claim, which in this case is also the claim that it is deprecated. The status is not meant to deprecate a reference (not saying that this is never useful, potentially, but you can only use it in one way, and it seems much more practical if deprecated statements get references that explain why they are deprecated).
Yes. I think a complete deprecated statement should look like this :
Rank: Deprecated Value: <some value> Qualifier: P2241:reason for deprecation + <some reason>
References
- P248:Stated in (or any other property for a reference) --> a
reference where the value is true (explaining why we added it) Value: <name of the reference>
- any additional qualifiers
- P1310:statement disputed by --> a
reference explaining why the claim is deprecated Value: <name of the reference>
- any additional qualifiers
I am afraid that this is not a good approach, and it will lead to problems in the future. The status "deprecated" refers to the *complete claim, including all qualifiers*. So if you add a qualifier P2241, it would also be part of what is "deprecated", which is clearly not intended here. This is part of the general data structure in Wikidata, and tools using the data would expect this to hold true. Ranks are a built-in feature of the software, so this aspect is not really open to interpretation.
What you are doing here is giving up part of the pre-defined structure and replacing it by some local (site-specific) consensus. I know that this might be a bit subtle and not so easy to see at first, but it is a big step away from structured data that is easy to share across applications.
For example, imagine an application wants to compare "normal" statements with "deprecated" statements to see if there is any apparent contradiction (the same statement being given with both ranks). This would no longer work if you add meta-information to deprecated statements in the form of qualifiers. For a software tool, an additional quantifier simply changes the meaning. Imagine that one statement has an additional "end date" qualifier that the other one is lacking -- clearly, it would be perfectly reasonable that the statement with the end date is deprecated while the one that has only a start but no end is not. Technically, there is no difference between this situation and the situation where you add a new qualifier "P2241".
Now you could say: "Software should know the special meaning of P2241 and treat it accordingly." But this is only working for one site (Wikidata in this case). A future Wikibase-enabled Commons or Wiktionary would use different properties. You end up with having to change software for each site, and severely reducing interoperability across sites (imagine you want to combine data from two sites before processing it).
Even if you are only interested in a single site (Wikidata), you are changing the way in which statements should be interpreted over time. If the community uses qualifiers to change the data model like this, then the current definition of these qualifiers dictates how statements should be interpreted. Then if you want to analyse history, things can be very difficult.
What to do? It is quite simple: P2241 clearly belongs into the reference of a deprecated statement, not into its qualifiers. This will retain the same information while keeping the distinction between the claim that is deprecated (and which may have qualifiers) and the meta-data that explains why this is the case. Indeed, giving justification and explanation for a statement is precisely what the references are for, so P2241 fits there
I am not so sure if the rest of your modelling can work either, since it seems to me that you cannot in general capture two references (the original "P248" one and the correcting "P1310" one) in a single reference. Giving them both as two individual references would be a bad idea, since it would again change the meaning of the data, since you would give two mutually contradicting references for the same claim, and site-specific extra information would be needed to understand what is going on.
In fact, this is another expectation that is implicit in the Wikidata data model: if you have a claim C with two references A and B, then you could as well have claim C twice, once with reference A and once with reference B. References therefore should never have cross-dependencies or play different roles.
Maybe I misunderstood and you meant something else: you could of course make a single reference and use a specific form (with only two properties, P248 and P1310). But then you need to use single items for each of the references. Many references on Wikidata are not expressed by single items but by many property-value pairs (think of "reference URL + retrieved + ..."). Such compound references would then not work in this encoding.
What to do? In general, I think it is most important to give the reference that explains the deprecation, not the (mistaken) one that claims a wrong thing. This also makes sense for other reasons: if we create statistics such as "80% of all Wikidata statements have references" then we don't want to count deprecated statements where the only reference given claims that the wrong thing is actually true. A "deprecated statement with reference" should always be one where we have a reference that supports the claim that the statement is not true (justifies why it is deprecated). Again, you can see here how important it is to stick to certain boundaries of interpretation when you want to process data with tools later on.
If my suggestions somehow don't work in practice, then the best way would be to file a feature request for having additional meta-data for deprecated statements. Since the ranks are built into the software, any solution that really needs to change the meaning of the software needs to be implemented in code. Then it would be the same approach on all future Wikibase sites and software could work with it. However, I really hope that the reference-based approach is acceptable to the Wikidata community in practice.
Best regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi., Wikidata has a real bad way of considering both arguments and authority. Arguments are not considered and authority leads to a cavalier way of interpreting responsibility. It has been obvious that administrators as a whole do not ensure that Wikidata policies are maintained.
So authority as an argument stinks. The point of tools and representation of data in tools is that they make things obvious. That is a bigger help than calling for even more talk (most talk is just opinions with no reflection on previous arguments)
Your notion that we can easily rollback our mistakes shows a disregard for the work people have done. It also shows little historic awareness. At some stage we "followed" for all the wrong reasons the GND. It took us a long time to undo all this. Really, the point of Wikidata is that our work supports Wikimedia projects that is primary and the rest is nice to have. Thanks, GerardM
On 14 August 2016 at 17:39, Thomas Douillard thomas.douillard@gmail.com wrote:
People may build representation in their mind, but first there is no guarantee that they all have the same representations which will lead to endless conflicts on incompatibilities. But mostly : there already is a reference interpretation.
This is not high level, this is very very low level and has an impact everywhere on Wikidata, basically. This should not be conflictual as we have a reference interpretation, a Markus and a Denny who should be authoritative on the question, and we can explain and point to pointers such as this thread. This is just the time to highlight this as community begins to ask itself about such questions - this might not be the case earlier as community used less the data and was not really seriously considering all this.
This is probably not to late either, as if wikidata is beginning to be used we can probably roll back without to much trouble. Consider the problems we could face if german do not share the same interpretation as french and as a result the infoboxes displays wrong datas ... We need such agreement, and we have strong arguments to show people the right path.
2016-08-14 17:25 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, It is even more base. It is about the understanding of the data. When people are to understand the data and have no help, they will build constructs in their mind and not consider at all any high level conceptual considerations.
I do not care too much about the "conceptual heritage of the Wikidata creators", they stand on the shoulders of giants as well. What I care about is the purpose of Wikidata. What it is there for and I repeat myself when I state that interoperability is secondary. The primary purpose of Wikidata is realised when said interoperability has an effect on the data in a qualitative way. So far it is far removed from the content in Wikidata itself and consequently the point has not been shown except abstractly.
So far we have been on a path where the work developed elsewhere was considered to be of a lesser value. It is why the Freebase import is a fiasco. Thanks, GerardM
On 14 August 2016 at 17:01, Thomas Douillard thomas.douillard@gmail.com wrote:
More seriously maybe :
I guess you mention tool because you argue that the correct interpretation of datas is given by the tools and humans. I think it's totally wrong in this case. First because it's a (very small) minimal set of requirements that Wikibase (not Wikidata) was built on, and those set is present since the very first conceptual data model of Wikidata. It's always been public and present in the conceptual description and determined the help pages if you bother read them. It's the framework that guided our decisions more or less explicitely, and this is relatively well understood from our core community. It just need to be spread on to the more distant Wikimedia user circles that are less into Wikidata. This should not be a problem. Things are very different with properties which community is able to create, delete and use as it wishes.
Another POV on this : this is one of Wikidata's pillars. The conceptual heritage of WIkidata creators.
2016-08-14 15:14 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, Markus it is very much a matter of perspective and we do not all see things in the same way. For me the re-usability of Wikidata is very much secondary. Important but secondary. The primary goal of Wikidata is to provide a data storage for Wikimedia projects. The problem that I see is that much effort has gone in secondary goals largely at the cost of the primary perspective.
For an editor of Wikidata Wikidata is hardly usable. It is very much because of tools like Reasonator that I can understand the data that is in Wikidata. It is also for this reason that "deprecation" will evolve away from you. It is wonderful that all these high level approaches exist but the problem is that it does not consider the effects on people editing Wikidata. SPARQL is now good enough to replace WDQ but the problem is that the tools build upon WDQ are not converted and SPARQL does not bring the easy use that I and others are accustomed to. There is no replacement for much of the functionality.
We do agree that the architecture of Wikidata has to be stable but so does its tooling and this is where we fail and consequently see a divergence. In the past I asked you for tools and I supported additional funding on the promise of support for tooling. So far I have noticed that the quality of the engine has improved but I have not seen improvements in or the tooling that makes use of the SPARQL engine.
For me all the attention to top level concerns have been at the cost of supporting people who actually enter the data. I do not see a strategy to converge Wikidata and Wikipedia editing and I have made the argument why this is vital for our quality repeatedly.
So as you want to preserve top level integrity do consider tooling and do consider what it is we aim for. Thanks, GerardM
On 14 August 2016 at 14:26, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
On 12.08.2016 17:24, Jean-Luc Léger wrote:
On 2016-08-11 22:29, Markus Kroetzsch wrote:
On 11.08.2016 18:45, Andra Waagmeester wrote: On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de> <mailto:markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>>> wrote: has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is
not the case.
I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given
statement is no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts.
The status "deprecated" is part of the claim of the statement. The reference is supposed to support this claim, which in this case is also the claim that it is deprecated. The status is not meant to deprecate a reference (not saying that this is never useful, potentially, but you can only use it in one way, and it seems much more practical if deprecated statements get references that
explain why they are deprecated).
Yes. I think a complete deprecated statement should look like this :
Rank: Deprecated Value: <some value> Qualifier: P2241:reason for deprecation + <some reason>
References
- P248:Stated in (or any other property for a reference) --> a
reference where the value is true (explaining why we added it) Value: <name of the reference>
- any additional qualifiers
- P1310:statement disputed by --> a
reference explaining why the claim is deprecated Value: <name of the reference>
- any additional qualifiers
I am afraid that this is not a good approach, and it will lead to problems in the future. The status "deprecated" refers to the *complete claim, including all qualifiers*. So if you add a qualifier P2241, it would also be part of what is "deprecated", which is clearly not intended here. This is part of the general data structure in Wikidata, and tools using the data would expect this to hold true. Ranks are a built-in feature of the software, so this aspect is not really open to interpretation.
What you are doing here is giving up part of the pre-defined structure and replacing it by some local (site-specific) consensus. I know that this might be a bit subtle and not so easy to see at first, but it is a big step away from structured data that is easy to share across applications.
For example, imagine an application wants to compare "normal" statements with "deprecated" statements to see if there is any apparent contradiction (the same statement being given with both ranks). This would no longer work if you add meta-information to deprecated statements in the form of qualifiers. For a software tool, an additional quantifier simply changes the meaning. Imagine that one statement has an additional "end date" qualifier that the other one is lacking -- clearly, it would be perfectly reasonable that the statement with the end date is deprecated while the one that has only a start but no end is not. Technically, there is no difference between this situation and the situation where you add a new qualifier "P2241".
Now you could say: "Software should know the special meaning of P2241 and treat it accordingly." But this is only working for one site (Wikidata in this case). A future Wikibase-enabled Commons or Wiktionary would use different properties. You end up with having to change software for each site, and severely reducing interoperability across sites (imagine you want to combine data from two sites before processing it).
Even if you are only interested in a single site (Wikidata), you are changing the way in which statements should be interpreted over time. If the community uses qualifiers to change the data model like this, then the current definition of these qualifiers dictates how statements should be interpreted. Then if you want to analyse history, things can be very difficult.
What to do? It is quite simple: P2241 clearly belongs into the reference of a deprecated statement, not into its qualifiers. This will retain the same information while keeping the distinction between the claim that is deprecated (and which may have qualifiers) and the meta-data that explains why this is the case. Indeed, giving justification and explanation for a statement is precisely what the references are for, so P2241 fits there
I am not so sure if the rest of your modelling can work either, since it seems to me that you cannot in general capture two references (the original "P248" one and the correcting "P1310" one) in a single reference. Giving them both as two individual references would be a bad idea, since it would again change the meaning of the data, since you would give two mutually contradicting references for the same claim, and site-specific extra information would be needed to understand what is going on.
In fact, this is another expectation that is implicit in the Wikidata data model: if you have a claim C with two references A and B, then you could as well have claim C twice, once with reference A and once with reference B. References therefore should never have cross-dependencies or play different roles.
Maybe I misunderstood and you meant something else: you could of course make a single reference and use a specific form (with only two properties, P248 and P1310). But then you need to use single items for each of the references. Many references on Wikidata are not expressed by single items but by many property-value pairs (think of "reference URL + retrieved
- ..."). Such compound references would then not work in this encoding.
What to do? In general, I think it is most important to give the reference that explains the deprecation, not the (mistaken) one that claims a wrong thing. This also makes sense for other reasons: if we create statistics such as "80% of all Wikidata statements have references" then we don't want to count deprecated statements where the only reference given claims that the wrong thing is actually true. A "deprecated statement with reference" should always be one where we have a reference that supports the claim that the statement is not true (justifies why it is deprecated). Again, you can see here how important it is to stick to certain boundaries of interpretation when you want to process data with tools later on.
If my suggestions somehow don't work in practice, then the best way would be to file a feature request for having additional meta-data for deprecated statements. Since the ranks are built into the software, any solution that really needs to change the meaning of the software needs to be implemented in code. Then it would be the same approach on all future Wikibase sites and software could work with it. However, I really hope that the reference-based approach is acceptable to the Wikidata community in practice.
Best regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
This does not seem to be a strong argument and seems only to reflect your opiniion
Authority does not have to rely on administrators enforcing whatever, it can come from agreement around common sense principle, can't it ? This case is far more deeply rooted than GND and does not limit us in any way. It's just a case of best practices.
You also have to account for which work this would be a problem, so we could try to find a solution, for this to be not just rhetorics.
2016-08-14 18:02 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi., Wikidata has a real bad way of considering both arguments and authority. Arguments are not considered and authority leads to a cavalier way of interpreting responsibility. It has been obvious that administrators as a whole do not ensure that Wikidata policies are maintained.
So authority as an argument stinks. The point of tools and representation of data in tools is that they make things obvious. That is a bigger help than calling for even more talk (most talk is just opinions with no reflection on previous arguments)
Your notion that we can easily rollback our mistakes shows a disregard for the work people have done. It also shows little historic awareness. At some stage we "followed" for all the wrong reasons the GND. It took us a long time to undo all this. Really, the point of Wikidata is that our work supports Wikimedia projects that is primary and the rest is nice to have. Thanks, GerardM
On 14 August 2016 at 17:39, Thomas Douillard thomas.douillard@gmail.com wrote:
People may build representation in their mind, but first there is no guarantee that they all have the same representations which will lead to endless conflicts on incompatibilities. But mostly : there already is a reference interpretation.
This is not high level, this is very very low level and has an impact everywhere on Wikidata, basically. This should not be conflictual as we have a reference interpretation, a Markus and a Denny who should be authoritative on the question, and we can explain and point to pointers such as this thread. This is just the time to highlight this as community begins to ask itself about such questions - this might not be the case earlier as community used less the data and was not really seriously considering all this.
This is probably not to late either, as if wikidata is beginning to be used we can probably roll back without to much trouble. Consider the problems we could face if german do not share the same interpretation as french and as a result the infoboxes displays wrong datas ... We need such agreement, and we have strong arguments to show people the right path.
2016-08-14 17:25 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, It is even more base. It is about the understanding of the data. When people are to understand the data and have no help, they will build constructs in their mind and not consider at all any high level conceptual considerations.
I do not care too much about the "conceptual heritage of the Wikidata creators", they stand on the shoulders of giants as well. What I care about is the purpose of Wikidata. What it is there for and I repeat myself when I state that interoperability is secondary. The primary purpose of Wikidata is realised when said interoperability has an effect on the data in a qualitative way. So far it is far removed from the content in Wikidata itself and consequently the point has not been shown except abstractly.
So far we have been on a path where the work developed elsewhere was considered to be of a lesser value. It is why the Freebase import is a fiasco. Thanks, GerardM
On 14 August 2016 at 17:01, Thomas Douillard <thomas.douillard@gmail.com
wrote:
More seriously maybe :
I guess you mention tool because you argue that the correct interpretation of datas is given by the tools and humans. I think it's totally wrong in this case. First because it's a (very small) minimal set of requirements that Wikibase (not Wikidata) was built on, and those set is present since the very first conceptual data model of Wikidata. It's always been public and present in the conceptual description and determined the help pages if you bother read them. It's the framework that guided our decisions more or less explicitely, and this is relatively well understood from our core community. It just need to be spread on to the more distant Wikimedia user circles that are less into Wikidata. This should not be a problem. Things are very different with properties which community is able to create, delete and use as it wishes.
Another POV on this : this is one of Wikidata's pillars. The conceptual heritage of WIkidata creators.
2016-08-14 15:14 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, Markus it is very much a matter of perspective and we do not all see things in the same way. For me the re-usability of Wikidata is very much secondary. Important but secondary. The primary goal of Wikidata is to provide a data storage for Wikimedia projects. The problem that I see is that much effort has gone in secondary goals largely at the cost of the primary perspective.
For an editor of Wikidata Wikidata is hardly usable. It is very much because of tools like Reasonator that I can understand the data that is in Wikidata. It is also for this reason that "deprecation" will evolve away from you. It is wonderful that all these high level approaches exist but the problem is that it does not consider the effects on people editing Wikidata. SPARQL is now good enough to replace WDQ but the problem is that the tools build upon WDQ are not converted and SPARQL does not bring the easy use that I and others are accustomed to. There is no replacement for much of the functionality.
We do agree that the architecture of Wikidata has to be stable but so does its tooling and this is where we fail and consequently see a divergence. In the past I asked you for tools and I supported additional funding on the promise of support for tooling. So far I have noticed that the quality of the engine has improved but I have not seen improvements in or the tooling that makes use of the SPARQL engine.
For me all the attention to top level concerns have been at the cost of supporting people who actually enter the data. I do not see a strategy to converge Wikidata and Wikipedia editing and I have made the argument why this is vital for our quality repeatedly.
So as you want to preserve top level integrity do consider tooling and do consider what it is we aim for. Thanks, GerardM
On 14 August 2016 at 14:26, Markus Kroetzsch < markus.kroetzsch@tu-dresden.de> wrote:
On 12.08.2016 17:24, Jean-Luc Léger wrote:
> On 2016-08-11 22:29, Markus Kroetzsch wrote: > > On 11.08.2016 18:45, Andra Waagmeester wrote: > > On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch > <markus.kroetzsch@tu-dresden.de > mailto:markus.kroetzsch@tu-dresden.de > mailto:markus.kroetzsch@tu-dresden.de > mailto:markus.kroetzsch@tu-dresden.de>> > wrote: > > > has a statement "population: 20,086 (point in time: > 2011)" > that is > confirmed by a reference. Nevertheless, the statement is > marked as > "deprecated". This would mean that the statement "the > popluation was > 20,086 in 2011" is wrong. As far as I can tell, this is > not > the case. > > > I wouldn't say that with a deprecated rank, that statement is > "wrong". I > consider de term deprecated to indicate that a given > statement is no > longer valid in the context of a given resource (reference). > I > agree, in > this specific case the use of the deprecated rank is wrong, > since no > references are given to that specific statement. > Nevertheless, I think it is possible to have disagreeing > resources on an > identical statement, where two identical statements exists, > one with > rank "deprecated" and one with rank "normal". It is up to the > user to > decide which source s/he trusts. > > > The status "deprecated" is part of the claim of the statement. > The > reference is supposed to support this claim, which in this case > is > also the claim that it is deprecated. The status is not meant to > deprecate a reference (not saying that this is never useful, > potentially, but you can only use it in one way, and it seems > much > more practical if deprecated statements get references that > explain > why they are deprecated). > > > Yes. I think a complete deprecated statement should look like this : > > Rank: Deprecated > Value: <some value> > Qualifier: P2241:reason for deprecation + <some reason> > > References > * P248:Stated in (or any other property for a reference) --> a > reference where the value is true (explaining why we added it) > Value: <name of the reference> > + any additional qualifiers > * P1310:statement disputed by --> a > reference explaining why the claim is deprecated > Value: <name of the reference> > + any additional qualifiers >
I am afraid that this is not a good approach, and it will lead to problems in the future. The status "deprecated" refers to the *complete claim, including all qualifiers*. So if you add a qualifier P2241, it would also be part of what is "deprecated", which is clearly not intended here. This is part of the general data structure in Wikidata, and tools using the data would expect this to hold true. Ranks are a built-in feature of the software, so this aspect is not really open to interpretation.
What you are doing here is giving up part of the pre-defined structure and replacing it by some local (site-specific) consensus. I know that this might be a bit subtle and not so easy to see at first, but it is a big step away from structured data that is easy to share across applications.
For example, imagine an application wants to compare "normal" statements with "deprecated" statements to see if there is any apparent contradiction (the same statement being given with both ranks). This would no longer work if you add meta-information to deprecated statements in the form of qualifiers. For a software tool, an additional quantifier simply changes the meaning. Imagine that one statement has an additional "end date" qualifier that the other one is lacking -- clearly, it would be perfectly reasonable that the statement with the end date is deprecated while the one that has only a start but no end is not. Technically, there is no difference between this situation and the situation where you add a new qualifier "P2241".
Now you could say: "Software should know the special meaning of P2241 and treat it accordingly." But this is only working for one site (Wikidata in this case). A future Wikibase-enabled Commons or Wiktionary would use different properties. You end up with having to change software for each site, and severely reducing interoperability across sites (imagine you want to combine data from two sites before processing it).
Even if you are only interested in a single site (Wikidata), you are changing the way in which statements should be interpreted over time. If the community uses qualifiers to change the data model like this, then the current definition of these qualifiers dictates how statements should be interpreted. Then if you want to analyse history, things can be very difficult.
What to do? It is quite simple: P2241 clearly belongs into the reference of a deprecated statement, not into its qualifiers. This will retain the same information while keeping the distinction between the claim that is deprecated (and which may have qualifiers) and the meta-data that explains why this is the case. Indeed, giving justification and explanation for a statement is precisely what the references are for, so P2241 fits there
I am not so sure if the rest of your modelling can work either, since it seems to me that you cannot in general capture two references (the original "P248" one and the correcting "P1310" one) in a single reference. Giving them both as two individual references would be a bad idea, since it would again change the meaning of the data, since you would give two mutually contradicting references for the same claim, and site-specific extra information would be needed to understand what is going on.
In fact, this is another expectation that is implicit in the Wikidata data model: if you have a claim C with two references A and B, then you could as well have claim C twice, once with reference A and once with reference B. References therefore should never have cross-dependencies or play different roles.
Maybe I misunderstood and you meant something else: you could of course make a single reference and use a specific form (with only two properties, P248 and P1310). But then you need to use single items for each of the references. Many references on Wikidata are not expressed by single items but by many property-value pairs (think of "reference URL + retrieved
- ..."). Such compound references would then not work in this encoding.
What to do? In general, I think it is most important to give the reference that explains the deprecation, not the (mistaken) one that claims a wrong thing. This also makes sense for other reasons: if we create statistics such as "80% of all Wikidata statements have references" then we don't want to count deprecated statements where the only reference given claims that the wrong thing is actually true. A "deprecated statement with reference" should always be one where we have a reference that supports the claim that the statement is not true (justifies why it is deprecated). Again, you can see here how important it is to stick to certain boundaries of interpretation when you want to process data with tools later on.
If my suggestions somehow don't work in practice, then the best way would be to file a feature request for having additional meta-data for deprecated statements. Since the ranks are built into the software, any solution that really needs to change the meaning of the software needs to be implemented in code. Then it would be the same approach on all future Wikibase sites and software could work with it. However, I really hope that the reference-based approach is acceptable to the Wikidata community in practice.
Best regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Il 14 ago 2016 18:35, "Thomas Douillard" thomas.douillard@gmail.com ha scritto:
This does not seem to be a strong argument and seems only to reflect your
opiniion
Authority does not have to rely on administrators enforcing whatever, it
can come from agreement around common sense principle, can't it ? This case is far more deeply rooted than GND and does not limit us in any way. It's just a case of best practices.
You also have to account for which work this would be a problem, so we
could try to find a solution, for this to be not just rhetorics.
+2
L.
Hi Gerard,
I don't think your answer is related to my email or this thread (not saying that your points may not be important in some way, but it should really go under a different subject line). I have merely proposed to use P2241 in references instead of using it as a qualifier. Sorry for my long email; I can see now that the main message got somewhat buried among details there.
It seems I should also comment on your remark about funding. You must be referring to the WMF funding for the Wikidata Toolkit IEG project. This has (among other things) led to the first Wikidata RDF exports and thereby made its contribution to creating today's SPARQL service (which would of course not have happened without the great work of Stas et al. to get it all running!). If you expected that this projects would lead to "improvements in or the tooling that makes use of the SPARQL engine" then you were one or two steps ahead of all of us: nobody has even talked about a SPARQL service when I applied, and the project had long ended when the SPARQL service went life. Maybe you have been thinking of some other project that is not related to my work at all?
Best regards,
Markus
On 14.08.2016 15:14, Gerard Meijssen wrote:
Hoi, Markus it is very much a matter of perspective and we do not all see things in the same way. For me the re-usability of Wikidata is very much secondary. Important but secondary. The primary goal of Wikidata is to provide a data storage for Wikimedia projects. The problem that I see is that much effort has gone in secondary goals largely at the cost of the primary perspective.
For an editor of Wikidata Wikidata is hardly usable. It is very much because of tools like Reasonator that I can understand the data that is in Wikidata. It is also for this reason that "deprecation" will evolve away from you. It is wonderful that all these high level approaches exist but the problem is that it does not consider the effects on people editing Wikidata. SPARQL is now good enough to replace WDQ but the problem is that the tools build upon WDQ are not converted and SPARQL does not bring the easy use that I and others are accustomed to. There is no replacement for much of the functionality.
We do agree that the architecture of Wikidata has to be stable but so does its tooling and this is where we fail and consequently see a divergence. In the past I asked you for tools and I supported additional funding on the promise of support for tooling. So far I have noticed that the quality of the engine has improved but I have not seen improvements in or the tooling that makes use of the SPARQL engine.
For me all the attention to top level concerns have been at the cost of supporting people who actually enter the data. I do not see a strategy to converge Wikidata and Wikipedia editing and I have made the argument why this is vital for our quality repeatedly.
So as you want to preserve top level integrity do consider tooling and do consider what it is we aim for. Thanks, GerardM
On 14 August 2016 at 14:26, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de mailto:markus.kroetzsch@tu-dresden.de> wrote:
On 12.08.2016 17:24, Jean-Luc Léger wrote: On 2016-08-11 22:29, Markus Kroetzsch wrote: On 11.08.2016 18:45, Andra Waagmeester wrote: On Thu, Aug 11, 2016 at 4:15 PM, Markus Kroetzsch <markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de> <mailto:markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>> <mailto:markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de> <mailto:markus.kroetzsch@tu-dresden.de <mailto:markus.kroetzsch@tu-dresden.de>>>> wrote: has a statement "population: 20,086 (point in time: 2011)" that is confirmed by a reference. Nevertheless, the statement is marked as "deprecated". This would mean that the statement "the popluation was 20,086 in 2011" is wrong. As far as I can tell, this is not the case. I wouldn't say that with a deprecated rank, that statement is "wrong". I consider de term deprecated to indicate that a given statement is no longer valid in the context of a given resource (reference). I agree, in this specific case the use of the deprecated rank is wrong, since no references are given to that specific statement. Nevertheless, I think it is possible to have disagreeing resources on an identical statement, where two identical statements exists, one with rank "deprecated" and one with rank "normal". It is up to the user to decide which source s/he trusts. The status "deprecated" is part of the claim of the statement. The reference is supposed to support this claim, which in this case is also the claim that it is deprecated. The status is not meant to deprecate a reference (not saying that this is never useful, potentially, but you can only use it in one way, and it seems much more practical if deprecated statements get references that explain why they are deprecated). Yes. I think a complete deprecated statement should look like this : Rank: Deprecated Value: <some value> Qualifier: P2241:reason for deprecation + <some reason> References * P248:Stated in (or any other property for a reference) --> a reference where the value is true (explaining why we added it) Value: <name of the reference> + any additional qualifiers * P1310:statement disputed by --> a reference explaining why the claim is deprecated Value: <name of the reference> + any additional qualifiers I am afraid that this is not a good approach, and it will lead to problems in the future. The status "deprecated" refers to the *complete claim, including all qualifiers*. So if you add a qualifier P2241, it would also be part of what is "deprecated", which is clearly not intended here. This is part of the general data structure in Wikidata, and tools using the data would expect this to hold true. Ranks are a built-in feature of the software, so this aspect is not really open to interpretation. What you are doing here is giving up part of the pre-defined structure and replacing it by some local (site-specific) consensus. I know that this might be a bit subtle and not so easy to see at first, but it is a big step away from structured data that is easy to share across applications. For example, imagine an application wants to compare "normal" statements with "deprecated" statements to see if there is any apparent contradiction (the same statement being given with both ranks). This would no longer work if you add meta-information to deprecated statements in the form of qualifiers. For a software tool, an additional quantifier simply changes the meaning. Imagine that one statement has an additional "end date" qualifier that the other one is lacking -- clearly, it would be perfectly reasonable that the statement with the end date is deprecated while the one that has only a start but no end is not. Technically, there is no difference between this situation and the situation where you add a new qualifier "P2241". Now you could say: "Software should know the special meaning of P2241 and treat it accordingly." But this is only working for one site (Wikidata in this case). A future Wikibase-enabled Commons or Wiktionary would use different properties. You end up with having to change software for each site, and severely reducing interoperability across sites (imagine you want to combine data from two sites before processing it). Even if you are only interested in a single site (Wikidata), you are changing the way in which statements should be interpreted over time. If the community uses qualifiers to change the data model like this, then the current definition of these qualifiers dictates how statements should be interpreted. Then if you want to analyse history, things can be very difficult. What to do? It is quite simple: P2241 clearly belongs into the reference of a deprecated statement, not into its qualifiers. This will retain the same information while keeping the distinction between the claim that is deprecated (and which may have qualifiers) and the meta-data that explains why this is the case. Indeed, giving justification and explanation for a statement is precisely what the references are for, so P2241 fits there I am not so sure if the rest of your modelling can work either, since it seems to me that you cannot in general capture two references (the original "P248" one and the correcting "P1310" one) in a single reference. Giving them both as two individual references would be a bad idea, since it would again change the meaning of the data, since you would give two mutually contradicting references for the same claim, and site-specific extra information would be needed to understand what is going on. In fact, this is another expectation that is implicit in the Wikidata data model: if you have a claim C with two references A and B, then you could as well have claim C twice, once with reference A and once with reference B. References therefore should never have cross-dependencies or play different roles. Maybe I misunderstood and you meant something else: you could of course make a single reference and use a specific form (with only two properties, P248 and P1310). But then you need to use single items for each of the references. Many references on Wikidata are not expressed by single items but by many property-value pairs (think of "reference URL + retrieved + ..."). Such compound references would then not work in this encoding. What to do? In general, I think it is most important to give the reference that explains the deprecation, not the (mistaken) one that claims a wrong thing. This also makes sense for other reasons: if we create statistics such as "80% of all Wikidata statements have references" then we don't want to count deprecated statements where the only reference given claims that the wrong thing is actually true. A "deprecated statement with reference" should always be one where we have a reference that supports the claim that the statement is not true (justifies why it is deprecated). Again, you can see here how important it is to stick to certain boundaries of interpretation when you want to process data with tools later on. If my suggestions somehow don't work in practice, then the best way would be to file a feature request for having additional meta-data for deprecated statements. Since the ranks are built into the software, any solution that really needs to change the meaning of the software needs to be implemented in code. Then it would be the same approach on all future Wikibase sites and software could work with it. However, I really hope that the reference-based approach is acceptable to the Wikidata community in practice. Best regards, Markus _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata