---------- Forwarded message ----------
From: ramesh kumar <ramesh_chill(a)hotmail.com>
Date: 9 March 2011 13:27
Subject: RE: Reg. Research using Wikipedia
To: dgerard(a)gmail.com
Dear Mr.Gerard,
Thanks for your instant response.
But is there a time-gap to check between one request into another request.
for ex: like 1 sec, or 1 milli sec.
If so, I can set a sleep state in my program. At the same, I have 3.1
million (3101144) wiki article titles.
So if I set 1 sec between one request, so for 1 day it takes
60(sec)*60(min)*24(hr)=86400 /2= 43200 requests per day(considering 1
sec sleep between 1 request to the other)
3101144/43200=71 days.
I feel the program takes 71 days to finish all the 3.1 million article titles.
Is there anyway, our university IP address will be given permission or
sending a official email from our department head to Wikipedia Server
administrator to consider that the program, I run from this particular
IP address is not any attack. so, our administrator allows us to do
faster request like 0.5 sec. So, I can finish my experiment within 35
days.
expecting your positive reply
regards
Ramesh
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Date: Wed, 9 Mar 2011 10:39:43 +0000
Subject: Re: Reg. Research using Wikipedia
From: dgerard(a)gmail.com
To: ramesh_chill(a)hotmail.com
I asked the wikitech-l list, which is where the system administrators
talk, and they said:
"If they use the API and wait for one request to finish before they
start the next one (i.e. don't make parallel requests), that's pretty
much always fine."
http://lists.wikimedia.org/pipermail/wikitech-l/2011-March/052137.html
Hopefully this will put your network administrators' minds at rest :-)
- d.
On 9 March 2011 09:47, ramesh kumar <ramesh_chill(a)hotmail.com> wrote:
> Dear Members,
> I am Ramesh, pursuing my PhD in Monash University, Malaysia. My Research is
> on blog classification using Wikipedia Categories.
> As for my experiment, I use 12 main categories of Wikipedia.
> I want to identify " which particular article belongs to which main 12
> categories?".
> So I wrote a program to collect the subcategories of each article and
> classify based on 12 categories offline.
> I have downloaded already wiki-dump which consists of around 3 million
> article titles.
> My program takes this 3 million article titles and goes to
> online Wikipedia website and fetch the subcategories.
> Our university network administrators are worried that, Wikipedia would
> consider as DDOS attack and could block our IP address, if my program
> functions.
> In order to get permission from Wikipedia, I was searching allover. I could
> able to find wikien-l members can help me.
> Could you please suggest me, whom to contact, what is the procedure to get
> approval for our IP address to do the process or other suggestions
> Eagerly waiting for a positive reply
> Thanks and Regards
> Ramesh