[WikiEN-l] exporting sets of pages
Rajarshi Guha
rajarshi.guha at gmail.com
Tue Jan 4 23:03:26 UTC 2011
Hi, I wasn't sure whether this was the appropriate mailing list for
this question - if not, pointers to the correct one would be
appreciated.
I would like to retrieve pages that contain, say, a DrugBox. The
following URL lists all pages that contain this info box
http://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Template:Drugbox&namespace=0&limit=5000&hidetrans=
What I'd like to do is then do a bulk export of these pages. As far as
I can tell, the Export options require that one provide article
titles. Furthermore, for some other infoboxes I have to page through
the results. Instead I'd like to do this programmatically.
The obvious solution would be to load Wikipedia into a local MySQL DB
and then perform the queries directly. But I'm interested in a rather
small subset of Wikipedia and loading the whole thing locally seems
overkill.
Is there a way I could export the articles containing Drugboxes or do
I need to install Wikipedia locally?
Thanks,
--
Rajarshi Guha
NIH Chemical Genomics Center
More information about the WikiEN-l
mailing list