On Mon, Mar 7, 2011 at 1:22 PM, Bináris <wikiposta(a)gmail.com> wrote:
Hi,
when I download a page in HTML, which contains titles of articles, these
titles are something like urlencode()-ed, but not quite; characters like
"(", ")", "!", ",", ":" appear without
encoding.
For example:
<li><a
href="/w/index.php?title=Avant_l%27aurore_(court-m%C3%A9trage)&action=edit&redlink=1"
class="new" title="Avant l'aurore (court-métrage) (page does not
exist)">Avant l'aurore (court-métrage)</a></li>
Is there a function in pywiki to handle this, or is there available a full
list of non-encoded characters? I used urlencode() + a dict of known
exceptions, but this is not the best solution.
>> page = wikipedia.Page(wikipedia.getSite(),
"Avant_l%27aurore_(court-m%C3%A9trage)")
>> page.urlname()
'Avant_l%27aurore_%28court-m%C3%A9trage%29'
--
André Engels, andreengels(a)gmail.com