On 3/10/2011 3:46 AM, David Gerard wrote:
feel the program takes 71 days to finish all the 3.1
million article titles.
Is there anyway, our university IP address will be given permission or
sending a official email from our department head to Wikipedia Server
administrator to consider that the program, I run from this particular
IP address is not any attack. so, our administrator allows us to do
faster request like 0.5 sec. So, I can finish my experiment within 35
days.
expecting your positive reply
regards
Ramesh
I can say, positively, that you'll get the job done faster by
downloading the dump file and cracking into it directly. I've got
scripts that can download and extract stuff from the XML dump in an hour
or so. I still have some processes that use the API, but I'm
increasingly using the dumps because it's faster and easier.
Note that many facts about Wikipedia topics have already been
extracted by DBpedia and Freebase. These are complimentary, and if
you're interested in getting results, you should use both. DBpedia has
some things that aren't in Freebase, such as Wikipedia's link graph and
redirects, but Freebase has a type system with 2x better recall for
many of the prevalent types.
You might find that DBpedia + Freebase has the information you
need. And if it doesn't, you'll still find it's a useful 'guidance
control' system for anything you're doing with Wikipedia data.