On Mon, Mar 7, 2011 at 7:22 AM, Bináris <wikiposta@gmail.com> wrote:

Hi,

when I download a page in HTML, which contains titles of articles, these titles are something like urlencode()-ed, but not quite; characters like "(", ")", "!", ",", ":" appear without encoding.

For example:
<li><a href="/w/index.php?title=Avant_l%27aurore_(court-m%C3%A9trage)&action=edit&redlink=1" class="new" title="Avant l'aurore (court-métrage) (page does not exist)">Avant l'aurore (court-métrage)</a></li>

Is there a function in pywiki to handle this, or is there available a full list of non-encoded characters? I used urlencode() + a dict of known exceptions, but this is not the best solution.

--
Bináris

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l