seth wrote:
Hi!
I wrote a perl script, which works on some HTML content of some
wikipedia-webpages. Some of those pages are >300kB and perls LWP-mirror
hangs up.
Two questions:
1. Is there a better/faster way to get the HTML content of e.g.
http://meta.wikimedia.org/wiki/Spam_blacklist/Log
than
my $ua = LWP::UserAgent->new;
$ua->mirror($url, $filename);
?
To get the content of wikipedia pages you should be using WikiProxy
http://meta.wikimedia.org/wiki/User:Duesentrieb/WikiProxy
If you still do need to fetch it by yourself, you can launch an external
tool (wget, curl...) to download it and then read it as a normal file.
2. If I've questions about such stuff, am I right
here? Otherwise, sorry
for bothering you. :-)
Cheers
seth
Yes, this is a good place :)
Platonides