Re: [Pywikipedia-l] Encoding in HTML source

7 Mar 2011

On Mon, Mar 7, 2011 at 1:22 PM, Bináris &lt;wikiposta(a)gmail.com&gt; wrote:
...
  Hi,

 when I download a page in HTML, which contains titles of articles, these
 titles are something like urlencode()-ed, but not quite; characters like
 "(", ")", "!", ",", ":" appear without
encoding.

 For example:
 <li><a

href="/w/index.php?title=Avant_l%27aurore_(court-m%C3%A9trage)&amp;action=edit&amp;redlink=1"
 class="new" title="Avant l'aurore (court-métrage) (page does not
 exist)">Avant l'aurore (court-métrage)</a></li>

 Is there a function in pywiki to handle this, or is there available a full
 list of non-encoded characters? I used urlencode() + a dict of known
 exceptions, but this is not the best solution. 
...
 >> page = wikipedia.Page(wikipedia.getSite(),
"Avant_l%27aurore_(court-m%C3%A9trage)")
>> page.urlname() 'Avant_l%27aurore_%28court-m%C3%A9trage%29'

-- 
André Engels, andreengels(a)gmail.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] Encoding in HTML source