Re: [Pywikipedia-l] Encoding in HTML source

7 Mar 2011


      why are you downloading HTML?
On Mon, Mar 7, 2011 at 7:22 AM, Bináris wikiposta@gmail.com wrote:
...
Hi,
when I download a page in HTML, which contains titles of articles, these
titles are something like urlencode()-ed, but not quite; characters like
"(", ")", "!", ",", ":" appear without encoding.
For example:
<li><a href="/w/index.php?title=Avant_l%27aurore_*(*court-m%C3%A9trage*)*&amp;action=edit&amp;redlink=1"
class="new" title="Avant l'aurore (court-métrage) (page does not
exist)">Avant l'aurore (court-métrage)</a></li>
Is there a function in pywiki to handle this, or is there available a full
list of non-encoded characters? I used urlencode() + a dict of known
exceptions, but this is not the best solution.
--
Bináris

Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] Encoding in HTML source