On Tue, Sep 3, 2013 at 1:12 AM, 김종범 <jongbeom.kim@nhn.com> wrote:
We offer search results by collecting the data of Wikipedia.

However, transmitting data by dumping is not satisfy freshness,

so we want to collect the data by API( https://www.mediawiki.org/wiki/API) ) for freshness.

Search engine crawlers should be crawling the normal webpages as an anonymous user with no cookies set, to take maximum advantage of caching. Crawlers using the API to fetch page contents are liable to be blocked.

If you follow the guidelines in https://meta.wikimedia.org/wiki/Data_request_limitations, you should probable be able to poll the API's list=recentchanges for the page titles (not the content!) that your crawler needs to re-crawl without issue, but in the end that decision is up to people who are not me. Alternatively, you could use the IRC recent changes feed to get that list.


--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation