Hi Petr
yes, makes sense. I will try your suggestion
thank you for your help
best
Peyman
On Oct 17, 2013, at 6:24 PM, Petr Onderka <gsvick(a)gmail.com> wrote:
If you want to find out which pages are in some
category, you need the
categorylinks.sql dump. And to use that, you will probably also need
page.sql.
Those dumps are in SQL, so probably the simplest way to use them is to
import them into a MySQL database and then access that.
Documentation that explains what will the imported tables contain are
at
https://www.mediawiki.org/wiki/Manual:Categorylinks_table and
https://www.mediawiki.org/wiki/Manual:Page_table.
Petr Onderka
[[en:User:Svick]]
On Fri, Oct 18, 2013 at 12:18 AM, Peyman Faratin <peyman(a)robustlinks.com> wrote:
> Apologize if this is not the appropriate forum for the question.
>
> I am trying to access the content of Category pages from either the dump or
> APIs.
>
> For example, I would like to get a complete list of rivers
>
>
http://en.wikipedia.org/wiki/Category:Lists_of_rivers
>
> The API does provide the content but it is throttled
>
>
https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&am…
>
> Therefore I would like to find the content in the dumps. However, I cannot
> find this information in the dumps. I have looked inside
>
>
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml
>
> and find nothing there. The pages are referenced in the the page SQL dumps
>
>
http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz
>
> Do any of the dumps contain the category page content?
>
> thank you
>
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>