If we are creating an ai app that needs to get information , would we be allowed to crawl wikipedia for this information? The app would probably be a search query of some kind, that give information back to the user, one of the sites used is wikipedia. The app would use parts of wikipedia's articles and send that info back to the user, and give them a link to click if they want to visit the full article. Each user can only query/search once per second; however the collective user base might query wikipedia more than once. Therefore, this web crawler may crawl more than once per second collectively with every user. Would this be allowed?
Wenqin Ye wrote:
If we are creating an ai app that needs to get information , would we be allowed to crawl wikipedia for this information? The app would probably be a search query of some kind, that give information back to the user, one of the sites used is wikipedia. The app would use parts of wikipedia's articles and send that info back to the user, and give them a link to click if they want to visit the full article. Each user can only query/search once per second; however the collective user base might query wikipedia more than once. Therefore, this web crawler may crawl more than once per second collectively with every user. Would this be allowed?
Hi.
Depending on your needs, MediaWiki has a robust Web API:
https://www.mediawiki.org/wiki/API:Main_page
The English Wikipedia Web API:
https://en.wikipedia.org/w/api.php
API etiquette is described here:
https://www.mediawiki.org/wiki/API:Etiquette
For larger data sets, you can try the XML or SQL dumps:
Or perhaps a caching layer would make sense.
Hope that helps.
MZMcBride
IMHO, the time when you had to invest time into crawling wikipedia have long past. I'd recommend to use Dbpedia who have already crawled a lot of data from wikipedia. They also have a tool for altering and tuning their parsers, http://mappings.dbpedia.org . ----- Yury Katkov, WikiVote
On Wed, Sep 11, 2013 at 7:12 AM, Wenqin Ye wenqin908@gmail.com wrote:
If we are creating an ai app that needs to get information , would we be allowed to crawl wikipedia for this information? The app would probably be a search query of some kind, that give information back to the user, one of the sites used is wikipedia. The app would use parts of wikipedia's articles and send that info back to the user, and give them a link to click if they want to visit the full article. Each user can only query/search once per second; however the collective user base might query wikipedia more than once. Therefore, this web crawler may crawl more than once per second collectively with every user. Would this be allowed? _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, 11 Sep 2013, at 12:42, Wenqin Ye wrote:
If we are creating an ai app that needs to get information , would we be allowed to crawl wikipedia for this information? The app would probably be a search query of some kind, that give information back to the user, one of the sites used is wikipedia.
What web search engines do is use - meta tags (keywords, short description) and - full page content (plain text) for full-text search. This is relatively painless. Your query looks like you are chasing additional information, which the API provides but the items I mentioned don't include directly. What such additional information are you looking for?
wikitech-l@lists.wikimedia.org