[Toolserver-l] Looking for utility to perform text search in dump

Ilmari Karonen nospam at vyznev.net
Sat Aug 1 10:31:40 UTC 2009


Danny B. wrote:
> 
> I'm looking for any kind of tool which would take the XML dump (most probably the pages-meta-current.xml.bz2, at least the pages-articles.xml.bz2) and would return the list of page titles (or alternatively/configurably page ids) of pages containing given string.
> 
> Does anybody have such (kind of) tool and is willing to share? Both command line or webpage interface are OK.

If you're only interested in page titles, why not just download 
all-titles-in-ns0.gz and grep it?

Alternatively, if you want titles in other namespaces too, I have a 
small perl script I once wrote that can extract such a list from the 
page.sql.gz dump -- I can clean it up and put it online somewhere if 
you're interested.

-- 
Ilmari Karonen



More information about the Toolserver-l mailing list