Frank Schumacher wrote:
Hello Emmanuel,
You can work with the xml dumps. Import them into
mysql and have a
look to
http://meta.wikimedia.org/wiki/Database_layout
I already had a look into it, but couldn't figure out, how to aquire the
desired information.
The table "categorylinks" only stores to which categorie(s) an article
belongs. Additionally, I need to know:
- which are the parent categories of category x
- which are the subcategories of category x
I couldn't figure out, where these informations are stored within the
database.
Categorylinks format is <page id> <category> which means that the page
with that id belongs to <category>
The parent/childs are done taking into account that categories are pages
themselves.
Parents of category foo:
-Get page id of Category:Foo
-List all category values for this page.
Subcategories of category foo:
-List all page ids with category "foo"
-Filter the page ids for those pages in the category namespace.