Hello everyone ,
I am very new to the mediawiki-api . I am planning on using this to extract geo - information about places . I have been referring to this tutorial by scraper wiki https://blog.scraperwiki.com/2011/12/how-to-scrape-and-parse-wikipedia/ . Though I am not sure if I should be using this api or the offline dump , and what is the difference between the two data sets ? I need to parse a lot of data (the articles as well as extract the geo-coordinates of places ) . Please help me out with this . Thank you !
On Fri, Mar 21, 2014 at 1:23 PM, Radhika Gaonkar radhikag992@gmail.comwrote:
I am very new to the mediawiki-api . I am planning on using this to
extract geo - information about places . I have been referring to this tutorial by scraper wiki https://blog.scraperwiki.com/2011/12/how-to-scrape-and-parse-wikipedia/ . Though I am not sure if I should be using this api or the offline dump
Don't scrape the live wiki. You may scrape a dump if you'd like.
Note that, if the only "geo information" you need are the coordinates, you can use the API to query them, like https://en.wikipedia.org/w/api.php?action=query&prop=coordinates&tit... .
and what is the difference between the two data sets ?
Mainly, the dumps are generated about once per month so they're not completely up-to-date.
Thank you very much ! Are there any tutorials for scraping the dump ? To be specific , I am most comfortable with python .
On Fri, Mar 21, 2014 at 6:54 PM, Brad Jorsch (Anomie) <bjorsch@wikimedia.org
wrote:
On Fri, Mar 21, 2014 at 1:23 PM, Radhika Gaonkar radhikag992@gmail.comwrote:
I am very new to the mediawiki-api . I am planning on using this to
extract geo - information about places . I have been referring to this tutorial by scraper wiki https://blog.scraperwiki.com/2011/12/how-to-scrape-and-parse-wikipedia/. Though I am not sure if I should be using this api or the offline dump
Don't scrape the live wiki. You may scrape a dump if you'd like.
Note that, if the only "geo information" you need are the coordinates, you can use the API to query them, like https://en.wikipedia.org/w/api.php?action=query&prop=coordinates&tit... .
and what is the difference between the two data sets ?
Mainly, the dumps are generated about once per month so they're not completely up-to-date.
-- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
mediawiki-api@lists.wikimedia.org