[Toolserver-l] Problem

11 Jun 2009


      Hello all,
I need help. I have a perl programme (for Check Wikipedia). This can 
scan a dump of a language very fast. 2200 pages per minute is no problem.
I will daily scan with the same script the page text of the live 
Wikipedia. Not all pages, but maybe 20000 per day per language. Normally 
this need only 10 minutes with the dump, but with the live Wikipedia 
this need many time. I use the Wikipedia-API to get the text of an 
article and so my script can only scan 120 pages per minute. So this 
scan need at the moment in enwiki 300 minutes or in dewiki 134 minutes. 
The most time my script is waiting. This is a problem because there is a 
high CPU usage.
I need a faster way to get the text from the live Wikipedia. So I can 
reduce the CPU usage.
Maybe someone know a faster way! Or have an other idea.
Thanks for every help,
Stefan (sk)
More Info:
http://de.wikipedia.org/wiki/Benutzer:Stefan_K%C3%BChn/Check_Wikipedia
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Check_Wikipedia
http://toolserver.org/~sk/checkwiki/checkwiki.pl

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

[Toolserver-l] Problem