Hi all,

We're trying to extract full type hierarchy of Wikidata starting from all occurrences of P31 and P279. While we have some custom code for this, we're thinking there may be a smarter/more-efficient way of doing it using SPARQL or a tool that we are probably unaware of. Any hint would be appreciated. :)

Thanks,

Leila

In case you wonder why we ended up with this question and who "we" is ;):

The research is being documented at https://meta.wikimedia.org/wiki/Research_talk:Expanding_Wikipedia_stubs_across_languages . (The documentation is not most up-to-date, but it will give you the gist of what we are doing.)

We are interested in building systems that can help editors and editathon organizers identify the most common structures for different article types given the already existing articles in each type/category in Wikipedia (in a fixed language or across languages) and the information available in those articles.

The challenge we have run into, and we're not the first to run into it, is that the categories in Wikipedia don't have (as a whole) is-a relationship. This is a big problem for information extraction based on the category system, and we're trying to find a way to clean it up before starting to use it for this research. (We've looked at the body of research that attempts to clean up Wikipedia category system for knowledge extraction and none of what we've found addresses the problem we have. More on that once we complete the documentation.)