Hi,
I have a requirement to inspect a given wikipage's url to determine if its size exceeds 500kb, and to get the page, truncate it to 500kb and pass it to some api.
I couldnt find a way to determine the size of a wikipage through the api.
I also tried to call HEAD on the wikipage. I do get results sometimes, but sometimes I get an http 403 forbidden. The same thing happens with http GET on the page. The pages I tried most often for testing were http://en.wikipedia.org/wiki/Barack_Obama and http://en.wikipedia.org/wiki/United_States.
Can you suggest an effective way to achieve the requirement above?
Thanks Anand
Anand Ramanathan wrote:
Hi,
I have a requirement to inspect a given wikipage's url to determine if its size exceeds 500kb, and to get the page, truncate it to 500kb and pass it to some api.
I couldnt find a way to determine the size of a wikipage through the api.
You can get page size via the api using rvprop=size, or by grabbing the revision content and looking at its length.
I also tried to call HEAD on the wikipage. I do get results sometimes, but sometimes I get an http 403 forbidden. The same thing happens with http GET on the page. The pages I tried most often for testing were http://en.wikipedia.org/wiki/Barack_Obama and http://en.wikipedia.org/wiki/United_States.
Identify yourself. Give your program a proper user-agent, like "Anand 500kb program". If you use the generic user agent of your framework or try to make you pass as a browser you will be blocked.
Can you suggest an effective way to achieve the requirement above?
Thanks Anand
Yes, I added a user agent, and that worked. Thanks!
- Anand
On Tue, Dec 29, 2009 at 4:19 PM, Platonides platonides@gmail.com wrote:
Anand Ramanathan wrote:
Hi,
I have a requirement to inspect a given wikipage's url to determine if its size exceeds 500kb, and to get the page, truncate it to 500kb and pass it to some api.
I couldnt find a way to determine the size of a wikipage through the api.
You can get page size via the api using rvprop=size, or by grabbing the revision content and looking at its length.
I also tried to call HEAD on the wikipage. I do get results sometimes, but sometimes I get an http 403 forbidden. The same thing happens with http GET on the page. The pages I tried most often for testing were http://en.wikipedia.org/wiki/Barack_Obama and http://en.wikipedia.org/wiki/United_States.
Identify yourself. Give your program a proper user-agent, like "Anand 500kb program". If you use the generic user agent of your framework or try to make you pass as a browser you will be blocked.
Can you suggest an effective way to achieve the requirement above?
Thanks Anand
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
mediawiki-api@lists.wikimedia.org