[Pywikipedia-l] Encoding in HTML source

Bináris wikiposta at gmail.com
Mon Mar 7 12:22:06 UTC 2011


Hi,

when I download a page in HTML, which contains titles of articles, these
titles are something like urlencode()-ed, but not quite; characters like
"(", ")", "!", ",", ":" appear without encoding.

For example:
<li><a href="/w/index.php?title=Avant_l%27aurore_*(*court-m%C3%A9trage*)*&amp;action=edit&amp;redlink=1"
class="new" title="Avant l'aurore (court-métrage) (page does not
exist)">Avant l'aurore (court-métrage)</a></li>

Is there a function in pywiki to handle this, or is there available a full
list of non-encoded characters? I used urlencode() + a dict of known
exceptions, but this is not the best solution.

-- 
Bináris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/pywikipedia-l/attachments/20110307/b98b04bd/attachment.htm 


More information about the Pywikipedia-l mailing list