extracting type hierarchy of Wikidata - Wikidata

6 Jul 2017


      Hi all,
We're trying to extract full type hierarchy of Wikidata starting from all
occurrences of P31 and P279. While we have some custom code for this, we're
thinking there may be a smarter/more-efficient way of doing it using SPARQL
or a tool that we are probably unaware of. Any hint would be appreciated. :)
Thanks,
Leila
In case you wonder why we ended up with this question and who "we" is ;):
The research is being documented at
https://meta.wikimedia.org/wiki/Research_talk:Expanding_Wikipedia_stubs_acro...
. (The documentation is not most up-to-date, but it will give you the gist
of what we are doing.)
We are interested in building systems that can help editors and editathon
organizers identify the most common structures for different article types
given the already existing articles in each type/category in Wikipedia (in
a fixed language or across languages) and the information available in
those articles.
The challenge we have run into, and we're not the first to run into it, is
that the categories in Wikipedia don't have (as a whole) is-a relationship.
This is a big problem for information extraction based on the category
system, and we're trying to find a way to clean it up before starting to
use it for this research. (We've looked at the body of research that
attempts to clean up Wikipedia category system for knowledge extraction and
none of what we've found addresses the problem we have. More on that once
we complete the documentation.)