Dear Sir, I thank you for your efforts. When dealing with biomedical taxonomic statements jn Wikidata, we found similar deficiencies. I have already decided to write a paper about the biomedical taxonomy of Wikidata and how to adjust it. I will be honoured if you can be the first author of the work. You have already extracted the taxonomic statements. So, you can easily filter the biomedical ones. This work has already been done for other taxonomies such as SNOMED-CT (https://scholar.google.ca/citations?user=UsG8QFwAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=c4LlYxsAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=jVLGHGQAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=fBAvwi4AAAAJ&hl=fr&oi=sra). I will be available online for further discussion if you agree to work with our team. This will be simple. Yours Sincerely, Houcemeddine Turki (he/him) Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia Undergraduate Researcher, UR12SP36 GLAM and Education Coordinator, Wikimedia TN User Group Member, WikiResearch Tunisia Member, Wiki Project Med Member, WikiIndaba Steering Committee Member, Wikimedia and Library User Group Steering Committee Co-Founder, WikiLingua Maghreb Founder, TunSci ____________________ +21629499418
-------- Message d'origine -------- De : Gabriel Altay gabriel.altay@gmail.com Date : 2019/06/15 23:05 (GMT+01:00) À : Discussion list for the Wikidata project wikidata@lists.wikimedia.org Objet : Re: [Wikidata] instance of, subclass of, oh my
Thanks Jan, I will pursue the badminton discussion on the talk page.
On Sat, Jun 15, 2019 at 5:49 PM Jan Ainali <jan@aina.limailto:jan@aina.li> wrote: Hello Gabriel,
I agree with you about the badminton tournaments, that seems odd. It appears to already be a discussion about that on the talk page of the only participant in the badminton project: https://www.wikidata.org/wiki/User_talk:Florentyna#subclass_of:_badminton_to...
Perhaps it is best to continue the discussion there?
/Jan Ainali http://ainali.comhttp://ainali.com/
Den lör 15 juni 2019 kl 23:11 skrev Gabriel Altay <gabriel.altay@gmail.commailto:gabriel.altay@gmail.com>: Hello everyone,
I was playing around with a recent wikidata dump and extracted the items that "looked" like classes based on the definition here,
https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Classes
Specifically, an item is a class-item if any of the following are true, * the item is the value of a P31 ("instance of") statement * the item has a P279 ("subclass of") statement (subclass) * the item is the value of a P279 ("subclass of") statement (superclass)
Once I extracted all items that met these criteria (2,399,621 items from wikidata-20190603-all.json.bz2) I started examining the results. One of the things I found slightly surprising is that there are about 23k badminton events that are classes b/c they have "subclass of https://www.wikidata.org/wiki/Q13357858" statements. SPARQL query below.
https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%0AWHERE%20%0A...
It also looks like there is a badminton project page, https://www.wikidata.org/wiki/Category:WikiProject_Badminton https://www.wikidata.org/wiki/Wikidata:WikiProject_Badminton/Subclass
I'd like to remove these statements as it seems that a particular instance of a badminton tournament https://www.wikidata.org/wiki/Q121940 is not a class.
It seems that this pattern is also in place for about 1,000,000 items which are instance of gene (e.g. https://www.wikidata.org/wiki/Q40108).
I had a couple questions for the mailing list,
1) do folks know if there is an active group working on wikidata ontology 2) i've read a few messages about shape expressions. would it be worthwhile to setup a shape expression that prevents most items from having both "instance of" and "subclass of" statements? 3) if these entries are generated by bots, what is the best way to get in touch with the owner, their user talk page?
I am probably missing a lot of information about what has been done so far in the community, but I'm happy to read anything someone points me towards.
best, -Gabriel _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.orgmailto:Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.orgmailto:Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thanks for the message Houcemeddine. I admire you optimism! I'm happy that you're investigating biomedical taxonomic statements in Wikidata, but I'm not interested in writing a paper right now. You can find examples of working with the JSON dumps in the qwikidata python package I wrote here https://qwikidata.readthedocs.io/en/stable/readme.html#json-dump . You can contact me off the mailing list if you would like help using the package. best, -Gabriel
On Sat, Jun 15, 2019 at 8:30 PM Houcemeddine A. Turki < turkiabdelwaheb@hotmail.fr> wrote:
Dear Sir, I thank you for your efforts. When dealing with biomedical taxonomic statements jn Wikidata, we found similar deficiencies. I have already decided to write a paper about the biomedical taxonomy of Wikidata and how to adjust it. I will be honoured if you can be the first author of the work. You have already extracted the taxonomic statements. So, you can easily filter the biomedical ones. This work has already been done for other taxonomies such as SNOMED-CT ( https://scholar.google.ca/citations?user=UsG8QFwAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=c4LlYxsAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=jVLGHGQAAAAJ&hl=fr&oi=sra, https://scholar.google.ca/citations?user=fBAvwi4AAAAJ&hl=fr&oi=sra). I will be available online for further discussion if you agree to work with our team. This will be simple. Yours Sincerely, Houcemeddine Turki (he/him) Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia Undergraduate Researcher, UR12SP36 GLAM and Education Coordinator, Wikimedia TN User Group Member, WikiResearch Tunisia Member, Wiki Project Med Member, WikiIndaba Steering Committee Member, Wikimedia and Library User Group Steering Committee Co-Founder, WikiLingua Maghreb Founder, TunSci ____________________ +21629499418
-------- Message d'origine -------- De : Gabriel Altay gabriel.altay@gmail.com Date : 2019/06/15 23:05 (GMT+01:00) À : Discussion list for the Wikidata project wikidata@lists.wikimedia.org
Objet : Re: [Wikidata] instance of, subclass of, oh my
Thanks Jan, I will pursue the badminton discussion on the talk page.
On Sat, Jun 15, 2019 at 5:49 PM Jan Ainali jan@aina.li wrote:
Hello Gabriel,
I agree with you about the badminton tournaments, that seems odd. It appears to already be a discussion about that on the talk page of the only participant in the badminton project: https://www.wikidata.org/wiki/User_talk:Florentyna#subclass_of:_badminton_to...
Perhaps it is best to continue the discussion there?
/Jan Ainali http://ainali.com
Den lör 15 juni 2019 kl 23:11 skrev Gabriel Altay < gabriel.altay@gmail.com>:
Hello everyone,
I was playing around with a recent wikidata dump and extracted the items that "looked" like classes based on the definition here,
https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Classes
Specifically, an item is a class-item if any of the following are true,
the item is the value of a P31 ("instance of") statement
the item has a P279 ("subclass of") statement (subclass)
the item is the value of a P279 ("subclass of") statement
(superclass)
Once I extracted all items that met these criteria (2,399,621 items from wikidata-20190603-all.json.bz2) I started examining the results. One of the things I found slightly surprising is that there are about 23k badminton events that are classes b/c they have "subclass of https://www.wikidata.org/wiki/Q13357858" statements. SPARQL query below.
https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%0AWHERE%20%0A...
It also looks like there is a badminton project page, https://www.wikidata.org/wiki/Category:WikiProject_Badminton https://www.wikidata.org/wiki/Wikidata:WikiProject_Badminton/Subclass
I'd like to remove these statements as it seems that a particular instance of a badminton tournament https://www.wikidata.org/wiki/Q121940 is not a class.
It seems that this pattern is also in place for about 1,000,000 items which are instance of gene (e.g. https://www.wikidata.org/wiki/Q40108).
I had a couple questions for the mailing list,
- do folks know if there is an active group working on wikidata
ontology 2) i've read a few messages about shape expressions. would it be worthwhile to setup a shape expression that prevents most items from having both "instance of" and "subclass of" statements? 3) if these entries are generated by bots, what is the best way to get in touch with the owner, their user talk page?
I am probably missing a lot of information about what has been done so far in the community, but I'm happy to read anything someone points me towards.
best, -Gabriel _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata