Hi!
I wrote a perl script, which works on some HTML content of some wikipedia-webpages. Some of those pages are >300kB and perls LWP-mirror hangs up.
Two questions: 1. Is there a better/faster way to get the HTML content of e.g. http://meta.wikimedia.org/wiki/Spam_blacklist/Log than my $ua = LWP::UserAgent->new; $ua->mirror($url, $filename); ? 2. If I've questions about such stuff, am I right here? Otherwise, sorry for bothering you. :-)
Cheers seth
seth wrote:
Hi!
I wrote a perl script, which works on some HTML content of some wikipedia-webpages. Some of those pages are >300kB and perls LWP-mirror hangs up.
Two questions:
- Is there a better/faster way to get the HTML content of e.g.
http://meta.wikimedia.org/wiki/Spam_blacklist/Log than my $ua = LWP::UserAgent->new; $ua->mirror($url, $filename); ?
To get the content of wikipedia pages you should be using WikiProxy http://meta.wikimedia.org/wiki/User:Duesentrieb/WikiProxy
If you still do need to fetch it by yourself, you can launch an external tool (wget, curl...) to download it and then read it as a normal file.
- If I've questions about such stuff, am I right here? Otherwise, sorry
for bothering you. :-)
Cheers seth
Yes, this is a good place :)
Platonides
Hi!
Platonides wrote:
seth wrote:
I wrote a perl script, which works on some HTML content of some wikipedia-webpages. Some of those pages are >300kB and perls LWP-mirror hangs up.
I was wrong. LWP-mirror did not hang up, but the content was not fully loaded, because of caching. After I purged the site manually, everything was ok.
- Is there a better/faster way to get the HTML content of e.g.
http://meta.wikimedia.org/wiki/Spam_blacklist/Log than my $ua = LWP::UserAgent->new; $ua->mirror($url, $filename); ?
To get the content of wikipedia pages you should be using WikiProxy http://meta.wikimedia.org/wiki/User:Duesentrieb/WikiProxy
Does this tool purge automatically? Is there any manual for that tool?
bye seth
To get the content of wikipedia pages you should be using WikiProxy http://meta.wikimedia.org/wiki/User:Duesentrieb/WikiProxy
Does this tool purge automatically? Is there any manual for that tool?
That page is all ther eis in terms of a manual. But the interface it quite simple. Anyway, it's not for loading HTML at all -- it's for loading WikiText. And it will require a special token if you want to use it from anywhere but the toolserver itself.
-- daniel
That's what the API is for.
-Soxred93
On Nov 25, 2008, at 5:00 PM [Nov 25, 2008 ], Daniel Kinzler wrote:
To get the content of wikipedia pages you should be using WikiProxy http://meta.wikimedia.org/wiki/User:Duesentrieb/WikiProxy
Does this tool purge automatically? Is there any manual for that tool?
That page is all ther eis in terms of a manual. But the interface it quite simple. Anyway, it's not for loading HTML at all -- it's for loading WikiText. And it will require a special token if you want to use it from anywhere but the toolserver itself.
-- daniel
Toolserver-l mailing list Toolserver-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l
toolserver-l@lists.wikimedia.org