Hi,
I am trying to build an offline version of the wikipedia categorisation tree. As usual with projects on wikipedia, I've downloaded dumps (actually the interesting one here is pages-articles.xml). And I found that none of the dumps has the relation between "Category:1960_works" and "Category:1960" which is present on the web page. And it is the same for a lot of categories I tried: many links are missing in the dump, but are present in the web. Any idea why is that so?
Thanks for your help, Anthony
CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its content. Thank you.
Towards A Sustainable Earth: Print Only When Necessary
Anthony Ventresque (Dr) wrote:
Hi,
I am trying to build an offline version of the wikipedia categorisation tree. As usual with projects on wikipedia, I've downloaded dumps (actually the interesting one here is pages-articles.xml). And I found that none of the dumps has the relation between "Category:1960_works" and "Category:1960" which is present on the web page. And it is the same for a lot of categories I tried: many links are missing in the dump, but are present in the web. Any idea why is that so?
Thanks for your help, Anthony
Using page.sql.gz and categorylinks.sql.gz would be more efficient for your task.
Thanks for your help, it indeed works. ________________________________________ From: wikitech-l-bounces@lists.wikimedia.org [wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Platonides [Platonides@gmail.com] Sent: 09 February 2011 05:32 To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] categorisation issues in dumps
Anthony Ventresque (Dr) wrote:
Hi,
I am trying to build an offline version of the wikipedia categorisation tree. As usual with projects on wikipedia, I've downloaded dumps (actually the interesting one here is pages-articles.xml). And I found that none of the dumps has the relation between "Category:1960_works" and "Category:1960" which is present on the web page. And it is the same for a lot of categories I tried: many links are missing in the dump, but are present in the web. Any idea why is that so?
Thanks for your help, Anthony
Using page.sql.gz and categorylinks.sql.gz would be more efficient for your task.
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its content. Thank you.
Towards A Sustainable Earth: Print Only When Necessary
wikitech-l@lists.wikimedia.org