[Toolserver-l] Looking for utility to perform text search in dump

Platonides platonides at gmail.com
Sat Jul 25 19:12:06 UTC 2009


Danny B. wrote:
> Hello,
> 
> I'm looking for any kind of tool which would take the XML dump (most probably the pages-meta-current.xml.bz2, at least the pages-articles.xml.bz2) and would return the list of page titles (or alternatively/configurably page ids) of pages containing given string.
> 
> Does anybody have such (kind of) tool and is willing to share? Both command line or webpage interface are OK.
> 
> Thank you.
> 
> 
> Danny B.

I have one program to do so, dated 3 years ago. Many of us have probably
rewritten that wheel. I'll send you privately.
There's also http://meta.wikimedia.org/wiki/User:Micke/WikiFind
Not the ultimate solution, but good enough. I have some changes to unify
both versions if you're interested.



More information about the Toolserver-l mailing list