Re: [Pywikipedia-l] Urlencoded section titles

12 Jun 2012

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

If you have to whole page content accessible, you can try to use
'Page.getSections()'... might help?

Greetings
DrTrigon

On 28.03.2012 13:20, Bináris wrote:
...

 2012/3/28 Bináris &lt;wikiposta(a)gmail.com
 <mailto:wikiposta@gmail.com>>

 Another issue: *á* is encoded as .C3.A1. However, a literal .C3.A1 
 in section title will also appear the same. Is there any way to 
 decide if .C3.A1 stands for *á* or for .C3.A1? I guess the 
 likelihood of someone writing a literal .C3.A1 into the section 
 title is very small, so this question may be theoretical, but I am
 a theoretical man. :-)

 While this was a theoratical problem, I created a practical one.
 There are characters with a shorter code, such as quotation mark
 (.22) and parentheses (.28, .29). Have a look at this section
 title: 
 http://hu.wikipedia.org/wiki/Szerkeszt%C5%91:BinBot/semmi#.22D.C3.A1tum.22:…

  You will see that the first two .22's (marked here with red, excuse me
...
  if this causes a problem for someone) are encoded
quotation marks,
 while the last (blue) one a literal .22 as part of a date
 (Hungarian date order is yyyy. mm. dd.). I simply don't see any
 chance to make the difference by bot unless searching for all
 section titles in question (as well as anchor templates) and try to
 make a reverse match. So this is something very easy to spoil and
 almost hopeless to correct.

 *:-(*

 -- Bináris

 _______________________________________________ Pywikipedia-l
 mailing list Pywikipedia-l(a)lists.wikimedia.org 
 https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l 
...PGP SIGNATURE...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/XpCwACgkQAXWvBxzBrDBRJgCfe0SF+Ym7S+l5rIHW3fc4db8j
3moAnjZqX/tGut+McHhecExN8VR1Ado5
=ehn+
-----END PGP SIGNATURE----- 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] Urlencoded section titles