question about performance/getting webpage content

List overview All Threads
Download

newer

older

upcoming account confirmation

Simple Geohack-like tool for CAS...

seth

24 Nov 2008 24 Nov '08

9:31 p.m.

Hi! I wrote a perl script, which works on some HTML content of some wikipedia-webpages. Some of those pages are >300kB and perls LWP-mirror hangs up. Two questions: 1. Is there a better/faster way to get the HTML content of e.g. http://meta.wikimedia.org/wiki/Spam_blacklist/Log than my $ua = LWP::UserAgent->new; $ua->mirror($url, $filename); ? 2. If I've questions about such stuff, am I right here? Otherwise, sorry for bothering you. :-) Cheers seth

Show replies by date

Platonides

24 Nov 24 Nov

10:15 p.m.

New subject: question about performance/getting webpage content

seth wrote:

...

To get the content of wikipedia pages you should be using WikiProxy http://meta.wikimedia.org/wiki/User:Duesentrieb/WikiProxy If you still do need to fetch it by yourself, you can launch an external tool (wget, curl...) to download it and then read it as a normal file.

...

2. If I've questions about such stuff, am I right here? Otherwise, sorry for bothering you. :-) Cheers seth

Yes, this is a good place :) Platonides

seth

25 Nov 25 Nov

2:18 a.m.

New subject: question about performance/getting webpage content

Hi! Platonides wrote:

...

seth wrote: >I wrote a perl script, which works on some HTML content of some >wikipedia-webpages. Some of those pages are >300kB and perls LWP-mirror >hangs up.

I was wrong. LWP-mirror did not hang up, but the content was not fully loaded, because of caching. After I purged the site manually, everything was ok.

...

1. Is there a better/faster way to get the HTML content of e.g. http://meta.wikimedia.org/wiki/Spam_blacklist/Log than my $ua = LWP::UserAgent->new; $ua->mirror($url, $filename); ?

To get the content of wikipedia pages you should be using WikiProxy http://meta.wikimedia.org/wiki/User:Duesentrieb/WikiProxy

Does this tool purge automatically? Is there any manual for that tool? bye seth

Daniel Kinzler

11 p.m.

New subject: question about performance/getting webpage content

...

To get the content of wikipedia pages you should be using WikiProxy http://meta.wikimedia.org/wiki/User:Duesentrieb/WikiProxy

Does this tool purge automatically? Is there any manual for that tool?

Soxred93

26 Nov 26 Nov

midnight

New subject: question about performance/getting webpage content

That's what the API is for. -Soxred93 On Nov 25, 2008, at 5:00 PM [Nov 25, 2008 ], Daniel Kinzler wrote:

...

To get the content of wikipedia pages you should be using WikiProxy http://meta.wikimedia.org/wiki/User:Duesentrieb/WikiProxy

Does this tool purge automatically? Is there any manual for that tool?

That page is all ther eis in terms of a manual. But the interface it quite simple. Anyway, it's not for loading HTML at all -- it's for loading WikiText. And it will require a special token if you want to use it from anywhere but the toolserver itself. -- daniel _______________________________________________ Toolserver-l mailing list Toolserver-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/toolserver-l

Daniel Kinzler

12:03 a.m.

New subject: question about performance/getting webpage content

Soxred93 schrieb:

...

That's what the API is for.

Well, the idea is that wikiproxy would be faster, because it uses the local database. At the moment, it isn't, really. I hope to rewrite it soon. -- daniel

5624

days inactive

5625

days old

toolserver-l@lists.wikimedia.org

Manage subscription

5 comments

4 participants

tags (0)

participants (4)

Daniel Kinzler
Platonides
seth
Soxred93