---------- Forwarded message ---------- From: ramesh kumar ramesh_chill@hotmail.com Date: 9 March 2011 13:27 Subject: RE: Reg. Research using Wikipedia To: dgerard@gmail.com
Dear Mr.Gerard, Thanks for your instant response. But is there a time-gap to check between one request into another request. for ex: like 1 sec, or 1 milli sec. If so, I can set a sleep state in my program. At the same, I have 3.1 million (3101144) wiki article titles. So if I set 1 sec between one request, so for 1 day it takes 60(sec)*60(min)*24(hr)=86400 /2= 43200 requests per day(considering 1 sec sleep between 1 request to the other) 3101144/43200=71 days. I feel the program takes 71 days to finish all the 3.1 million article titles. Is there anyway, our university IP address will be given permission or sending a official email from our department head to Wikipedia Server administrator to consider that the program, I run from this particular IP address is not any attack. so, our administrator allows us to do faster request like 0.5 sec. So, I can finish my experiment within 35 days. expecting your positive reply regards Ramesh
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Date: Wed, 9 Mar 2011 10:39:43 +0000 Subject: Re: Reg. Research using Wikipedia From: dgerard@gmail.com To: ramesh_chill@hotmail.com
I asked the wikitech-l list, which is where the system administrators talk, and they said:
"If they use the API and wait for one request to finish before they start the next one (i.e. don't make parallel requests), that's pretty much always fine."
http://lists.wikimedia.org/pipermail/wikitech-l/2011-March/052137.html
Hopefully this will put your network administrators' minds at rest :-)
- d.
On 9 March 2011 09:47, ramesh kumar ramesh_chill@hotmail.com wrote:
Dear Members, I am Ramesh, pursuing my PhD in Monash University, Malaysia. My Research is on blog classification using Wikipedia Categories. As for my experiment, I use 12 main categories of Wikipedia. I want to identify " which particular article belongs to which main 12 categories?". So I wrote a program to collect the subcategories of each article and classify based on 12 categories offline. I have downloaded already wiki-dump which consists of around 3 million article titles. My program takes this 3 million article titles and goes to online Wikipedia website and fetch the subcategories. Our university network administrators are worried that, Wikipedia would consider as DDOS attack and could block our IP address, if my program functions. In order to get permission from Wikipedia, I was searching allover. I could able to find wikien-l members can help me. Could you please suggest me, whom to contact, what is the procedure to get approval for our IP address to do the process or other suggestions Eagerly waiting for a positive reply Thanks and Regards Ramesh