why are you downloading HTML?<br><br><div class="gmail_quote">On Mon, Mar 7, 2011 at 7:22 AM, Bináris <span dir="ltr"><<a href="mailto:wikiposta@gmail.com">wikiposta@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Hi,<br><br>when I download a page in HTML, which contains titles of articles, these titles are something like urlencode()-ed, but not quite; characters like "(", ")", "!", ",", ":" appear without encoding.<br>
<br>For example:<br><li><a href="/w/index.php?title=Avant_l%27aurore_<font style="color: rgb(204, 0, 0);" size="4"><b>(</b></font>court-m%C3%A9trage<font size="4"><b style="color: rgb(204, 0, 0);">)</b></font>&amp;action=edit&amp;redlink=1" class="new" title="Avant l'aurore (court-métrage) (page does not exist)">Avant l'aurore (court-métrage)</a></li><br>
<br>Is there a function in pywiki to handle this, or is there available a full list of non-encoded characters? I used urlencode() + a dict of known exceptions, but this is not the best solution.<br clear="all"><br>-- <br>
<font color="#888888">
Bináris<br>
</font><br>_______________________________________________<br>
Pywikipedia-l mailing list<br>
<a href="mailto:Pywikipedia-l@lists.wikimedia.org">Pywikipedia-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br>
<br></blockquote></div><br>