[WikiEN-l] exporting sets of pages

Rajarshi Guha rajarshi.guha at gmail.com
Tue Jan 4 23:03:26 UTC 2011


Hi, I wasn't sure whether this was the appropriate mailing list for
this question - if not, pointers to the correct one would be
appreciated.

I would like to retrieve pages that contain, say, a DrugBox. The
following URL lists all pages that contain this info box

http://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:Drugbox&namespace=0&limit=5000&hidetrans=

What I'd like to do is then do a bulk export of these pages. As far as
I can tell, the Export options require that one provide article
titles. Furthermore, for some other infoboxes I have to page through
the results. Instead I'd like to do this programmatically.

The obvious solution would be to load Wikipedia into a local MySQL DB
and then perform the queries directly. But I'm interested in a rather
small subset of Wikipedia and loading the whole thing locally seems
overkill.

Is there a way I could export the articles containing Drugboxes or do
I need to install Wikipedia locally?

Thanks,

-- 
Rajarshi Guha
NIH Chemical Genomics Center



More information about the WikiEN-l mailing list