Wenqin Ye wrote:
If we are creating an ai app that needs to get
information , would we be
allowed to crawl wikipedia for this information? The app would probably be
a search query of some kind, that give information back to the user, one
of
the sites used is wikipedia. The app would use parts of wikipedia's
articles and send that info back to the user, and give them a link to
click
if they want to visit the full article. Each user can only query/search
once per second; however the collective user base might query wikipedia
more than once. Therefore, this web crawler may crawl more than once per
second collectively with every user. Would this be allowed?
Hi.
Depending on your needs, MediaWiki has a robust Web API:
https://www.mediawiki.org/wiki/API:Main_page
The English Wikipedia Web API:
https://en.wikipedia.org/w/api.php
API etiquette is described here:
https://www.mediawiki.org/wiki/API:Etiquette
For larger data sets, you can try the XML or SQL dumps:
http://dumps.wikimedia.org/
Or perhaps a caching layer would make sense.
Hope that helps.
MZMcBride