Fwd: Reg. Research using Wikipedia - Wikitech-l

10 Mar 2011

      ---------- Forwarded message ----------
From: ramesh kumar ramesh_chill@hotmail.com
Date: 9 March 2011 13:27
Subject: RE: Reg. Research using Wikipedia
To: dgerard@gmail.com
Dear Mr.Gerard,
Thanks for your instant response.
But is there a time-gap to check between one request into another request.
for ex: like 1 sec, or 1 milli sec.
If so, I can set a sleep state in my program. At the same, I have 3.1
million (3101144) wiki article titles.
So if I set 1 sec between one request, so for 1 day it takes
60(sec)*60(min)*24(hr)=86400 /2= 43200 requests per day(considering 1
sec sleep between 1 request to the other)
3101144/43200=71 days.
I feel the program takes 71 days to finish all the 3.1 million article titles.
Is there anyway, our university IP address will be given permission or
sending a official email from our department head to Wikipedia Server
administrator to consider that the program, I run from this particular
IP address is not any attack. so, our administrator allows us to do
faster request like 0.5 sec. So, I can finish my experiment within 35
days.
expecting your positive reply
regards
Ramesh
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
...
Date: Wed, 9 Mar 2011 10:39:43 +0000
Subject: Re: Reg. Research using Wikipedia
From: dgerard@gmail.com
To: ramesh_chill@hotmail.com
I asked the wikitech-l list, which is where the system administrators
talk, and they said:
"If they use the API and wait for one request to finish before they
start the next one (i.e. don't make parallel requests), that's pretty
much always fine."
http://lists.wikimedia.org/pipermail/wikitech-l/2011-March/052137.html
Hopefully this will put your network administrators' minds at rest :-)

d.

On 9 March 2011 09:47, ramesh kumar ramesh_chill@hotmail.com wrote:
...
Dear Members,
I am Ramesh, pursuing my PhD in Monash University, Malaysia. My Research is
on blog classification using Wikipedia Categories.
As for my experiment, I use 12 main categories of Wikipedia.
I want to identify " which particular article belongs to which main 12
categories?".
So I wrote a program to collect the subcategories of each article and
classify based on 12 categories offline.
I have downloaded already wiki-dump which consists of around 3 million
article titles.
My program takes this 3 million article titles and goes to
online Wikipedia website and fetch the subcategories.
Our university network administrators are worried that, Wikipedia would
consider as DDOS attack and could block our IP address, if my program
functions.
In order to get permission from Wikipedia, I was searching allover. I could
able to find wikien-l members can help me.
Could you please suggest me, whom to contact, what is the procedure to get
approval for our IP address to do the process or other suggestions
Eagerly waiting for a positive reply
Thanks and Regards
Ramesh