Hi Behrang,
I think you should be asking these types of questions on a list where people
give advice on how to do things using a certain programming language (e.g.,
Python, Perl, etc.). For isnatnce, you can find lists related to Python
here:
http://www.python.org/community/lists/.
Best,
--muhammad abdul-mageed,
Ph.D. student,
Indiana University Computational Linguistics and
School of Library & Info. Science
On Fri, Dec 11, 2009 at 7:27 AM, Behrang Saeedzadeh <behrangsa(a)gmail.com>wrote;wrote:
Hi,
I have downloaded enwiki-latest-all-titles-in-ns0.gz and I want to extract
main titles and store them in another file. For example, some titles have
meta information (e.g. disambiguation etc.) and I want these to be removed.
Can I remove all the text between parentheses from the titles to achieve
this?
Also some titles start with the "!" character. and some are enclosed
between
two or three of them such as !Adiso_Amigos!. What is the purpose of "!" in
such cases? Also why some titles are enclosed between two double quotes
such
as "400_Years_of_Telescope"?
Finally, is there a document describing all these conventions?
P.S: Is this the right place to ask such questions?
Cheers,
Behrang Saeedzadeh
-------------------------------
http://my.opera.com/behrangsa
http://twitter.com/behrangsa
_______________________________________________
WikiEN-l mailing list
WikiEN-l(a)lists.wikimedia.org
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l