Does the pages-meta-current.xml.bz2 file at http://download.wikimedia.org/enwiki/20070527/ contain the category pages? (such as http://en.wikipedia.org/wiki/Category:American_linguists)
Is there a way to download JUST the category pages?
Why does http://en.wikipedia.org/wiki/Special:Export/Category:American_linguists not match the content on http://en.wikipedia.org/wiki/Category:American_linguists?
Thanks,
Trent
Trent Mera wrote:
Does the pages-meta-current.xml.bz2 file at http://download.wikimedia.org/enwiki/20070527/ contain the category pages? (such as http://en.wikipedia.org/wiki/Category:American_linguists)
Is there a way to download JUST the category pages?
You need to download categorylinks.sql.gz Also, to match the id's with pagenames, page.sql.gz
(you could grab pages-meta-current.xml.bz2 and search for [[Category: ]] links on every article, but you don't need so much work)
Why does http://en.wikipedia.org/wiki/Special:Export/Category:American_linguists not match the content on http://en.wikipedia.org/wiki/Category:American_linguists?
You are getting the page *content*, not the pages in the category. I.e. http://en.wikipedia.org/wiki/Special:Export/Category:American_linguists matches what is at http://en.wikipedia.org/w/index.php?title=Category:American_linguists&ac...
You may want to go to http://en.wikipedia.org/wiki/Special:Export put 'American linguists' on the "Add pages from category:" box and press "Add". You will get on the box below a list of the pages on that category..
wikitech-l@lists.wikimedia.org