Hello,
I downloaded pywikipedia yesterday and am using it on the Assamese Wikipedia. Thanks for a great product!
Nevertheless, I have a little difficulty trying to make it do exactly what I want. I am using it to correct some unicode encoding issues. In particular, I am trying to replace some unicode characters by some others. They are very short: at most 3 characters long.
But I have been unable to avoid picking up matches inside wikilinks (internal as well as inter-language). Is there a way to do so without employing unicode regularization?
Thanks,
Hi,
please read http://meta.wikimedia.org/wiki/Replace.py, there are a lot of funny and useful things. Exceptions are your friends, either in command line or within a fix, if you use fixes.py or user-fixes.py. All the exceptions are not correctly written at the moment, you will find the approximate list in the Italian or Hungarian version (top of the page), but I tell you, *link* is the exception for all internal inks, including interwiki and categories, and *interwiki* just for interwikis.
2011/10/4 W Chaipau wikichaipau@gmail.com
Hello,
I downloaded pywikipedia yesterday and am using it on the Assamese Wikipedia. Thanks for a great product!
Nevertheless, I have a little difficulty trying to make it do exactly what I want. I am using it to correct some unicode encoding issues. In particular, I am trying to replace some unicode characters by some others. They are very short: at most 3 characters long.
But I have been unable to avoid picking up matches inside wikilinks (internal as well as inter-language). Is there a way to do so without employing unicode regularization?
Thanks,
Chaipau Wikipedia
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
pywikipedia-l@lists.wikimedia.org