This c program will also run and complete about 50 times faster then that java and php code previously metioned.
Jeff
Jeffrey V. Merkey wrote:
You could also just modify this code (released unde GPLv3) and use it to strip out titles.
Stuff it into a file under linux called "parsetitle.c" and type:
gcc parsetitle.c -o parsetitle
./parsetitle < enwiki<date>.xml > titles.txt
Jeff