Hi!
I'd like to announce that the category tree of certain wikis is now available as RDF dump and in Wikidata Query Service.
More documentation is at: https://www.mediawiki.org/wiki/Wikidata_query_service/Categories which I will summarize shortly below.
The dumps are located at https://dumps.wikimedia.org/other/categoriesrdf/. You can use these dumps any way you wish, data format is described at the link above[1].
The same dump is loaded into "categories" namespace in WDQS, which can be queried by https://query.wikidata.org/bigdata/namespace/categories/sparql?query=SPARQL. Sorry, no GUI support yet (probably will happen later). See example in the docs[2].
These datasets are not updated automatically yet, so they'll be up to date roughly for the date of the latest dump. Hopefully soon it will be automated and then the datasets will be updated daily.
The list of currently supported wikis is here: https://noc.wikimedia.org/conf/categories-rdf.dblist - these are basically all 1M+ wikis and couple more that I added for various reasons. If you have a good candidate wiki to add, please tell me or write on the talk page for the document above.
Please note this is only the first step for the project, so there might still be some rough edges. I am announcing it early since I think it would be useful for people to look at the dumps and SPARQL endpoint and see if something is missing or does not work properly, and share ideas on how it can be used.
We plan eventually to use it for search improvement[3] - this work is still in progress.
As always, we welcome any comments and suggestions.
[1] https://www.mediawiki.org/wiki/Wikidata_query_service/Categories#Data_format [2] https://www.mediawiki.org/wiki/Wikidata_query_service/Categories#Accessing_t... [3] https://phabricator.wikimedia.org/T165982
Thanks,