On Sat, Jul 25, 2009 at 6:21 AM, Danny B.Wikipedia.Danny.B@email.cz wrote:
I'm looking for any kind of tool which would take the XML dump (most probably the pages-meta-current.xml.bz2, at least the pages-articles.xml.bz2) and would return the list of page titles (or alternatively/configurably page ids) of pages containing given string.
I have had good luck in the past simply writing little C programs to use libexpat for this purpose.
- Carl