Re: [Pywikipedia-l] search output

2 Aug 2009


      On Thu, Jul 30, 2009 at 6:30 PM, Merlijn van Deen valhallasw@arctus.nlwrote:
...
On Thu, July 30, 2009 9:11 pm, Stig Meireles Johansen wrote:
...
I hacked a little on some old perl-code I had laying around which I once
[..snip..]
here it is: http://toolserver.no/~stigmj/tools/src/xml-search.pl.txthttp://toolserver.no/%7Estigmj/tools/src/xml-search.pl.txt
...
/Stigmj
Suggestion: pywikipediabot has good built-in support. My attempt at
building a simple parser (http://arctus.nl/~valhallasw/pulldom.pyhttp://arctus.nl/%7Evalhallasw/pulldom.py)
is
about 10 times slower than just using four (much more readable) lines of
code:
That may be, but when I tried your code on
http://download.wikimedia.org/nowiki/20090729/nowiki-20090729-pages-articles...
unpacking of course) I got this:
Traceback (most recent call last):
  File "search.py", line 5, in <module>
    print page.title
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position
1: ordinal not in range(128)
While my code ran like this:
$ time ./xml-search.pl nowiki-20090729-pages-articles.xml "{|" 0 > t.t
real    1m16.511s
user    1m15.657s
sys     0m0.856s
$ grep ^Searched t.t
Searched through 407565 articles and found 20889 matches
Give me some working code and I'll do a comparison.. :)
/Stig

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Pywikipedia-l] search output