Hi all,
Too many items on Wikidata still miss the basic statements. Perhaps we can focus together for a short period of time on a single subject to get this fixed.
For example: all items with instance of (P31) sports season should also have have sport (P641) as statement.
When I just ran a query I saw 24000 sports season items that are still missing sport (P641).
Query: https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A...
I already did a few myself but for the largest part help is needed. Who has ideas and can help getting this statement added to all the sports season items.
Thanks!
Romaine
Please do not do this, What you are likely wanting to accomplish is relating a sports season to a category of sports and this is already done. So the relationships are inferred.
On Fri, Dec 23, 2022 at 11:01 PM Romaine Wiki romaine.wiki@gmail.com wrote:
Hi all,
Too many items on Wikidata still miss the basic statements. Perhaps we can focus together for a short period of time on a single subject to get this fixed.
For example: all items with instance of (P31) sports season should also have have sport (P641) as statement.
When I just ran a query I saw 24000 sports season items that are still missing sport (P641).
Query: https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A...
I already did a few myself but for the largest part help is needed. Who has ideas and can help getting this statement added to all the sports season items.
Thanks!
Romaine
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Why should this not be done? It seems reasonable to me. Is there some official statement that this should not be done?
More generally, where is any notion of inference in Wikidata defined?
There appear to be more problems with sports season. For example, https://www.wikidata.org/wiki/Q1487136 doesn't appear to be linked to any league or cup.
peter
On 12/23/22 15:15, Thad Guidry wrote:
Please do not do this, What you are likely wanting to accomplish is relating a sports season to a category of sports and this is already done. So the relationships are inferred.
On Fri, Dec 23, 2022 at 11:01 PM Romaine Wiki romaine.wiki@gmail.com wrote:
Hi all, Too many items on Wikidata still miss the basic statements. Perhaps we can focus together for a short period of time on a single subject to get this fixed. For example: all items with instance of (P31) sports season should also have have sport (P641) as statement. When I just ran a query I saw 24000 sports season items that are still missing sport (P641). Query: https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ27020041%20.%0A%20%20MINUS%20%7B%0A%20%20%20%20%3Fitem%20wdt%3AP641%20%3Fmissing%20.%0A%20%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%22.%20%7D%0A%7D%0A I already did a few myself but for the largest part help is needed. Who has ideas and can help getting this statement added to all the sports season items. Thanks! Romaine _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/GST2SPL43JUGN74242BWRS7CYDLFMDCL/ To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
-- Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
Wikidata mailing list --wikidata@lists.wikimedia.org Public archives athttps://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email towikidata-leave@lists.wikimedia.org
On Fri, 23 Dec 2022 at 21:32, Peter F. Patel-Schneider < pfpschneider@gmail.com> wrote:
Why should this not be done? It seems reasonable to me. Is there some official statement that this should not be done?
The Blazegraph Wikidata SPARQL endpoint (as it is a sadly abandoned codebase) already creaking at the seams, and struggling to keep up with the happily thriving growth of Wikidata.
In this situation it seems that adding redundant factoids into the database might not be the best use of constrained resources, for now at least.
More generally, where is any notion of inference in Wikidata defined?
This is a good question, I’d love to know too. Maybe mapping the equivalent P:Whatevers to rdfs:subClassOf and rdf:type would be a start?
If they were in a form from which we could generate even just SPARQL CONSTRUCT queries, and perhaps populate an auxiliary dataset/database with the additional implied information.
Dan
There appear to be more problems with sports season. For example, https://www.wikidata.org/wiki/Q1487136 doesn't appear to be linked to any league or cup.
peter
On 12/23/22 15:15, Thad Guidry wrote:
Please do not do this, What you are likely wanting to accomplish is relating a sports season to a category of sports and this is already done. So the relationships are inferred.
On Fri, Dec 23, 2022 at 11:01 PM Romaine Wiki romaine.wiki@gmail.com wrote:
Hi all,
Too many items on Wikidata still miss the basic statements. Perhaps we can focus together for a short period of time on a single subject to get this fixed.
For example: all items with instance of (P31) sports season should also have have sport (P641) as statement.
When I just ran a query I saw 24000 sports season items that are still missing sport (P641).
Query: https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A...
I already did a few myself but for the largest part help is needed. Who has ideas and can help getting this statement added to all the sports season items.
Thanks!
Romaine
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
-- Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
These 2000 something items with no other statement than that it is a sports season (and hence no redundancy at all) could be prioritized as they are kind of useless right now: https://w.wiki/698Y
Best, Jan Ainali
Den fre 23 dec. 2022 kl 22:46 skrev Dan Brickley danbri@danbri.org:
On Fri, 23 Dec 2022 at 21:32, Peter F. Patel-Schneider < pfpschneider@gmail.com> wrote:
Why should this not be done? It seems reasonable to me. Is there some official statement that this should not be done?
The Blazegraph Wikidata SPARQL endpoint (as it is a sadly abandoned codebase) already creaking at the seams, and struggling to keep up with the happily thriving growth of Wikidata.
In this situation it seems that adding redundant factoids into the database might not be the best use of constrained resources, for now at least.
More generally, where is any notion of inference in Wikidata defined?
This is a good question, I’d love to know too. Maybe mapping the equivalent P:Whatevers to rdfs:subClassOf and rdf:type would be a start?
If they were in a form from which we could generate even just SPARQL CONSTRUCT queries, and perhaps populate an auxiliary dataset/database with the additional implied information.
Dan
There appear to be more problems with sports season. For example, https://www.wikidata.org/wiki/Q1487136 doesn't appear to be linked to any league or cup.
peter
On 12/23/22 15:15, Thad Guidry wrote:
Please do not do this, What you are likely wanting to accomplish is relating a sports season to a category of sports and this is already done. So the relationships are inferred.
On Fri, Dec 23, 2022 at 11:01 PM Romaine Wiki romaine.wiki@gmail.com wrote:
Hi all,
Too many items on Wikidata still miss the basic statements. Perhaps we can focus together for a short period of time on a single subject to get this fixed.
For example: all items with instance of (P31) sports season should also have have sport (P641) as statement.
When I just ran a query I saw 24000 sports season items that are still missing sport (P641).
Query: https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A...
I already did a few myself but for the largest part help is needed. Who has ideas and can help getting this statement added to all the sports season items.
Thanks!
Romaine
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
-- Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Is Wikidata again being held back by the SPARQL endpoint? I thought that this had been resolved some time ago.
But are these triples redundant? They might be redundant in the sense that someone who knows how sports teams, leagues, and seasons are put together would know where to look for the sport. But this is putting a heavy bar on consumers of Wikidata, which in my opinion is not what Wikidata should be doing. Instead Wikidata should be easy to use. And it is entirely possible that these triples would not be redundant in all cases.
peter
On 12/23/22 16:45, Dan Brickley wrote:
On Fri, 23 Dec 2022 at 21:32, Peter F. Patel-Schneider pfpschneider@gmail.com wrote:
Why should this not be done? It seems reasonable to me. Is there some official statement that this should not be done?
The Blazegraph Wikidata SPARQL endpoint (as it is a sadly abandoned codebase) already creaking at the seams, and struggling to keep up with the happily thriving growth of Wikidata.
In this situation it seems that adding redundant factoids into the database might not be the best use of constrained resources, for now at least.
More generally, where is any notion of inference in Wikidata defined?
This is a good question, I’d love to know too. Maybe mapping the equivalent P:Whatevers to rdfs:subClassOf and rdf:type would be a start?
If they were in a form from which we could generate even just SPARQL CONSTRUCT queries, and perhaps populate an auxiliary dataset/database with the additional implied information.
Dan
There appear to be more problems with sports season. For example, https://www.wikidata.org/wiki/Q1487136 doesn't appear to be linked to any league or cup. peter On 12/23/22 15:15, Thad Guidry wrote:
Please do not do this, What you are likely wanting to accomplish is relating a sports season to a category of sports and this is already done. So the relationships are inferred. On Fri, Dec 23, 2022 at 11:01 PM Romaine Wiki <romaine.wiki@gmail.com> wrote: Hi all, Too many items on Wikidata still miss the basic statements. Perhaps we can focus together for a short period of time on a single subject to get this fixed. For example: all items with instance of (P31) sports season should also have have sport (P641) as statement. When I just ran a query I saw 24000 sports season items that are still missing sport (P641). Query: https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ27020041%20.%0A%20%20MINUS%20%7B%0A%20%20%20%20%3Fitem%20wdt%3AP641%20%3Fmissing%20.%0A%20%20%7D%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%22.%20%7D%0A%7D%0A I already did a few myself but for the largest part help is needed. Who has ideas and can help getting this statement added to all the sports season items. Thanks! Romaine _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/GST2SPL43JUGN74242BWRS7CYDLFMDCL/ To unsubscribe send an email to wikidata-leave@lists.wikimedia.org -- Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/ _______________________________________________ Wikidata mailing list --wikidata@lists.wikimedia.org Public archives athttps://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/ERKXD6D3673T6GA5NGBAYTAW44DDHTPY/ To unsubscribe send an email towikidata-leave@lists.wikimedia.org
_______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/message/OD5VGSCX2JF6QXXLBDMJCEZMN6OPOFY4/ To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Wikidata mailing list --wikidata@lists.wikimedia.org Public archives athttps://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email towikidata-leave@lists.wikimedia.org
The Wikidata subclass of ontology/taxonomy is unfortunately a lot of true-but-unhelpful info if you do some inferencing.
Subclassing in particular is not very useful. As an example, let's take the Mayor of Madison, WI - for any property we say about her, what are the classes and superclasses of the target of that, e.g. we say that she's a Mayor, what classes are 'Mayor' a member of?
SELECT distinct ?p ?pLabel ?item ?itemLabel ?itemClass ?itemClassLabel WHERE { wd:Q63039729 ?p ?item. # Q63039729 is the current mayor of madison OPTIONAL {?item wdt:P31/wdt:P279* ?itemClass . } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then en language } order by (?item)
This query comes back with 562 classes, most of which are upper-ontology statements that are really not very useful at best and in some cases almost feel like they're so irrelevant as to make the whole exercise pointless - for example,
- via some subclass path in wikidata, we learn that 'United States of America' is a member of the classes 'agent', 'matter', 'set', and 'astronomical body part'. - The Mayor's first name is 'Satya', which is a member of the classes 'data', 'information', 'series', 'non-physical entity', 'multiset', and a 'mathematical object', among others. - She was educated at 'Smith College', which (I'm happy to learn from wikidata) is a member of the class '3 dimensional objects'. - The mayor is the first gay mayor of Madison, and it turns out according to wikidata 'lesbianism' is a member of the class 'occurence', 'spatio-temporal entity', and more.
So while I guess an auxiliary database with inferences from wikidata would be neat, I think it'd be a lot of noise and I'm not sure all that useful in practice.
I do wish that there was some support in wikibase to know more about classes and instances - so if you add a P31 instanceOf property to an item or a P279 subClassOf to a class, Wikibase tells you "BTW, you're also saying that your thing is also an instance of all of these classes/your class is now a subclass of all these other classes too"
-Erik
On Fri, Dec 23, 2022 at 3:46 PM Dan Brickley danbri@danbri.org wrote:
On Fri, 23 Dec 2022 at 21:32, Peter F. Patel-Schneider < pfpschneider@gmail.com> wrote:
Why should this not be done? It seems reasonable to me. Is there some official statement that this should not be done?
The Blazegraph Wikidata SPARQL endpoint (as it is a sadly abandoned codebase) already creaking at the seams, and struggling to keep up with the happily thriving growth of Wikidata.
In this situation it seems that adding redundant factoids into the database might not be the best use of constrained resources, for now at least.
More generally, where is any notion of inference in Wikidata defined?
This is a good question, I’d love to know too. Maybe mapping the equivalent P:Whatevers to rdfs:subClassOf and rdf:type would be a start?
If they were in a form from which we could generate even just SPARQL CONSTRUCT queries, and perhaps populate an auxiliary dataset/database with the additional implied information.
Dan
There appear to be more problems with sports season. For example, https://www.wikidata.org/wiki/Q1487136 doesn't appear to be linked to any league or cup.
peter
On 12/23/22 15:15, Thad Guidry wrote:
Please do not do this, What you are likely wanting to accomplish is relating a sports season to a category of sports and this is already done. So the relationships are inferred.
On Fri, Dec 23, 2022 at 11:01 PM Romaine Wiki romaine.wiki@gmail.com wrote:
Hi all,
Too many items on Wikidata still miss the basic statements. Perhaps we can focus together for a short period of time on a single subject to get this fixed.
For example: all items with instance of (P31) sports season should also have have sport (P641) as statement.
When I just ran a query I saw 24000 sports season items that are still missing sport (P641).
Query: https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A...
I already did a few myself but for the largest part help is needed. Who has ideas and can help getting this statement added to all the sports season items.
Thanks!
Romaine
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
-- Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
It may be that Wikidata has a lot of general classes, but this is unavoidable I think if Wikidata is going to store a lot of different kinds of information. (This is not to say that there are not problems in the Wikidata class hierarchy.)
For example one of the objects that the mayor of Madison is related to is the United States of America. There are 82 different classes that can be reached by following instance of and then zero or more subclass links. But there are 8 different classes that the United States of America is a direct instance of. If there are any classes that do not belong in the 82 then it is not because they are general classes (object, entity) it is because there are suspect generalization links. For example, how does the United States of America get to be a set? Or a geographical feature?
There are a few classes that are suspect in this list, particularly three SOMA classes. I don't see why Wiikidata should have SOMA classes that should just mirror regular Wikidata classes.
peter
On 12/23/22 18:47, Erik Paulson wrote:
The Wikidata subclass of ontology/taxonomy is unfortunately a lot of true-but-unhelpful info if you do some inferencing.
Subclassing in particular is not very useful. As an example, let's take the Mayor of Madison, WI - for any property we say about her, what are the classes and superclasses of the target of that, e.g. we say that she's a Mayor, what classes are 'Mayor' a member of?
SELECT distinct ?p ?pLabel ?item ?itemLabel ?itemClass ?itemClassLabel WHERE { wd:Q63039729 ?p ?item. # Q63039729 is the current mayor of madison OPTIONAL {?item wdt:P31/wdt:P279* ?itemClass . } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then en language } order by (?item)
This query comes back with 562 classes, most of which are upper-ontology statements that are really not very useful at best and in some cases almost feel like they're so irrelevant as to make the whole exercise pointless - for example,
- via some subclass path in wikidata, we learn that 'United States of America' is a member of the classes 'agent', 'matter', 'set', and 'astronomical body part'.
- The Mayor's first name is 'Satya', which is a member of the classes 'data', 'information', 'series', 'non-physical entity', 'multiset', and a 'mathematical object', among others.
- She was educated at 'Smith College', which (I'm happy to learn from wikidata) is a member of the class '3 dimensional objects'.
- The mayor is the first gay mayor of Madison, and it turns out according to wikidata 'lesbianism' is a member of the class 'occurence', 'spatio-temporal entity', and more.
So while I guess an auxiliary database with inferences from wikidata would be neat, I think it'd be a lot of noise and I'm not sure all that useful in practice.
I do wish that there was some support in wikibase to know more about classes and instances - so if you add a P31 instanceOf property to an item or a P279 subClassOf to a class, Wikibase tells you "BTW, you're also saying that your thing is also an instance of all of these classes/your class is now a subclass of all these other classes too"
-Erik
The opposite is true, it is a must that this data is added to these items. For a sports season it is basic information to have the sport in question being added, as otherwise these items are practically useless. The opposite is their purpose: to be useful. One of the first things users of the data want to do is to be able to select the sport in question data is wanted from. Commonly not all sports at once but one single sport, let's say football, tennis, baseball, ...
On top of this would adding the sport also help to make these items even extra useful: most of these items have only one or two statements on them. By being able to select the sport, the users who work with this data can better focus on the items for that particular sport those users are focussed on.
Two months ago I published an analysis in what became clear that the stability, quality, and inter-item structured data is far from the structured level as what in general would have wished for and would have expected. Wikidata contains structured data, but only on the level of a single item. In practise, structured data goes beyond just the single item and is needed on all levels of data. Sadly it is missing on the various levels, which makes it for individual items not a good to just say that we need to rely on the data on other items. In closed data systems there is a structural coordination in place, which takes care of structural adding of data on all levels. There it is possible to rely on relationships. Wikidata with its open nature misses the structural coordination, and misses the structured data beyond the single item. And as those (missing) levels define the relationships, it is not something to reply on.
To continue, on Wikipedia (and other platforms) the communities have set some standards to make sure that the content has/gets the minimal quality. For a large parts the quality of the Wikidata just sucks, often it is almost completely missing, incomplete, inconsistent or even false. With luck, a dedicated contributor has taken great care of a relative small group of items, but large parts are a big mess. I remember from 15-20 years ago on Wikipedia people saying not wanting to have basic standards, like for example as that was considered redundant, not needed, etc. By now most contributors have hopefully learned how wrong that idea was. I am sorry to say, but that is what I see happen here too.
We need to get to basic standards for sport seasons items. Besides indicating that it is a sports season (P31), other basic properties that should always be present are country, sport, point in time, sport season of, follows, followed by.
With today's challenge, the focus is first the sports. On this users can build further to get that all sport seasons have at least the basic properties added.
Romaine
Op vr 23 dec. 2022 om 21:16 schreef Thad Guidry thadguidry@gmail.com:
Please do not do this, What you are likely wanting to accomplish is relating a sports season to a category of sports and this is already done. So the relationships are inferred.
On Fri, Dec 23, 2022 at 11:01 PM Romaine Wiki romaine.wiki@gmail.com wrote:
Hi all,
Too many items on Wikidata still miss the basic statements. Perhaps we can focus together for a short period of time on a single subject to get this fixed.
For example: all items with instance of (P31) sports season should also have have sport (P641) as statement.
When I just ran a query I saw 24000 sports season items that are still missing sport (P641).
Query: https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A...
I already did a few myself but for the largest part help is needed. Who has ideas and can help getting this statement added to all the sports season items.
Thanks!
Romaine
Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org
-- Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/ _______________________________________________ Wikidata mailing list -- wikidata@lists.wikimedia.org Public archives at https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/mes... To unsubscribe send an email to wikidata-leave@lists.wikimedia.org