If you want to find out which pages are in some category, you need the
categorylinks.sql dump. And to use that, you will probably also need
page.sql.
Those dumps are in SQL, so probably the simplest way to use them is to
import them into a MySQL database and then access that.
Documentation that explains what will the imported tables contain are
at
https://www.mediawiki.org/wiki/Manual:Categorylinks_table and
https://www.mediawiki.org/wiki/Manual:Page_table.
Petr Onderka
[[en:User:Svick]]
On Fri, Oct 18, 2013 at 12:18 AM, Peyman Faratin <peyman(a)robustlinks.com> wrote:
Apologize if this is not the appropriate forum for the
question.
I am trying to access the content of Category pages from either the dump or
APIs.
For example, I would like to get a complete list of rivers
http://en.wikipedia.org/wiki/Category:Lists_of_rivers
The API does provide the content but it is throttled
https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&am…
Therefore I would like to find the content in the dumps. However, I cannot
find this information in the dumps. I have looked inside
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml
and find nothing there. The pages are referenced in the the page SQL dumps
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz
Do any of the dumps contain the category page content?
thank you
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l