[Toolserver-l] Looking for utility to perform text search in dump

Carl (CBM) cbm.wikipedia at gmail.com
Sat Jul 25 15:36:55 UTC 2009


On Sat, Jul 25, 2009 at 6:21 AM, Danny B.<Wikipedia.Danny.B at email.cz> wrote:
> I'm looking for any kind of tool which would take the XML dump (most probably the
> pages-meta-current.xml.bz2, at least the pages-articles.xml.bz2) and would return
> the list of page titles (or alternatively/configurably page ids) of pages containing given string.

I have had good luck in the past simply writing little C programs to
use libexpat for this purpose.

- Carl



More information about the Toolserver-l mailing list