[Toolserver-l] Looking for utility to perform text search in dump
Platonides
platonides at gmail.com
Sat Jul 25 19:12:06 UTC 2009
Danny B. wrote:
> Hello,
>
> I'm looking for any kind of tool which would take the XML dump (most probably the pages-meta-current.xml.bz2, at least the pages-articles.xml.bz2) and would return the list of page titles (or alternatively/configurably page ids) of pages containing given string.
>
> Does anybody have such (kind of) tool and is willing to share? Both command line or webpage interface are OK.
>
> Thank you.
>
>
> Danny B.
I have one program to do so, dated 3 years ago. Many of us have probably
rewritten that wheel. I'll send you privately.
There's also http://meta.wikimedia.org/wiki/User:Micke/WikiFind
Not the ultimate solution, but good enough. I have some changes to unify
both versions if you're interested.
More information about the Toolserver-l
mailing list