Hi,
I have downloaded enwiki-latest-all-titles-in-ns0.gz and I want to extract main titles and store them in another file. For example, some titles have meta information (e.g. disambiguation etc.) and I want these to be removed. Can I remove all the text between parentheses from the titles to achieve this?
Also some titles start with the "!" character. and some are enclosed between two or three of them such as !Adiso_Amigos!. What is the purpose of "!" in such cases? Also why some titles are enclosed between two double quotes such as "400_Years_of_Telescope"?
Finally, is there a document describing all these conventions?
P.S: Is this the right place to ask such questions?
Cheers, Behrang Saeedzadeh ------------------------------- http://my.opera.com/behrangsa http://twitter.com/behrangsa