[Pywikipedia-l] Encoding in HTML source

Bináris wikiposta at gmail.com
Mon Mar 7 15:26:03 UTC 2011


I send this back to the list:

2011/3/7 Andre Engels <andreengels at gmail.com>

> I don't know about that, but I think you can work the other way
> around, using a bit of regular expression magic:
>
> import re
> ...
> existing = [wikipedia.Page(wikipedia.getSite(), pname).title() for
> pname in re.findall(r"title=(.*?)&amp;action=edit", fullsourcetext)]
>
> def exists(page):
>      return page.title() in existing
>

This works fine!  I didn't know that titles could be encoded in Page().
There are already some regexes in my code. Thank you!

-- 
Bináris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/pywikipedia-l/attachments/20110307/39b18c91/attachment.htm 


More information about the Pywikipedia-l mailing list