I have questions about the results I obtained, which is in the form #id_source_category title_target_category Problem I have is that the directed graph is actually "mixed": sometimes you have a real sub-category pointing to a parent, sometimes viceversa so I can't understand the importance
That's not true. Each link in categorylinks is from a page (cl_from, this can be a category page) to its parent category (cl_to).
- is there (maybe in another dump) a mark or parameter (smtg like "ns"
maybe ? ) telling you which directory is head respect to another? As example, take "http://en.wikipedia.org/wiki/Category:World_War_II" It is written "this is root category...."
That text just describes what belongs to this category. It has nothing to do with the structure, that's always the same.
- the other question is about results:
http://en.wikipedia.org/wiki/Category:World_War_II collects for sub-categories which I have not found in categorilinks.sql e.g. for WWII I otain the conflicts but not other ones.
If I look at the categorylinks for that category (cl_from = 690451) in the latest dump, I get some “conflicts” categories, then some “Wars involving X” categories and also few others, like “Modern Europe”. I don't see any parent category that would be missing.
Maybe it would help if you described the issue in more detail: what results exactly are you getting, what results are you expecting and how do the two differ.
Petr Onderka [[en:User:Svick]]