On 9 March 2011 16:00, Platonides Platonides@gmail.com wrote:
Dear Members, I am Ramesh, pursuing my PhD in Monash University, Malaysia. My Research is on blog classification using Wikipedia Categories. As for my experiment, I use 12 main categories of Wikipedia. I want to identify " which particular article belongs to which main 12 categories?". So I wrote a program to collect the subcategories of each article and classify based on 12 categories offline. I have downloaded already wiki-dump which consists of around 3 million article titles. My program takes this 3 million article titles and goes to online Wikipedia website and fetch the subcategories.
Why do you need to access the live wikipedia for this? Using categorylinks.sql and page.sql you should be able to fetch the same data. Probably faster.
I concur. Everything required for this project should be in the dumps.