Hi, I am currently working on a project which involves using the wikipedia content. The expected traffic that our system needs to serve is around 200 qps during peak time. My question is if using Media WIKI directly is really my option here (I mean sending get requests to http://en.wikipedia.org/w/api.php ridectly without any local mirror). Would this traffic be supported or banned? Also what about the availability of the service and possible latencies? If it was banned what is the best approach? Thanks in advance for any answer. Ewa Szwed
You didn't say what kind of queries are you planing to do, but wouldn't the database dumps [1] be enough for you?
Petr Onderka [[en:User:Svick]]
[1]: http://en.wikipedia.org/wiki/Wikipedia:Database_download
On Fri, Dec 28, 2012 at 3:32 PM, Ewa Szwed ewaszynal@gmail.com wrote:
Hi, I am currently working on a project which involves using the wikipedia content. The expected traffic that our system needs to serve is around 200 qps during peak time. My question is if using Media WIKI directly is really my option here (I mean sending get requests to http://en.wikipedia.org/w/api.php ridectly without any local mirror). Would this traffic be supported or banned? Also what about the availability of the service and possible latencies? If it was banned what is the best approach? Thanks in advance for any answer. Ewa Szwed _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Hi, Thanks for your answer. The database dumps and setting up the mirror is my second option I guess. Before exploring that I would like to make sure that using API directly is not an option due to the traffic. Or is it? The system being developed is read only. Characteristics below:
LOOKUP
* take a search term, return "exact" match article
response will likely need to include the title, snippet, thumbnail and URL
SEARCH
* take a search term; return relevant articles
likely only return the most relevant article, and preferably only articles that exceed a specific relevancy score
response will likely need to include the title, snippet, thumbnail and URL
ARTICLE GET take a article ID, return contents of article I would be grateful for more info on API accepted external traffics. Thanks.
2012/12/28 Petr Onderka gsvick@gmail.com
You didn't say what kind of queries are you planing to do, but wouldn't the database dumps [1] be enough for you?
Petr Onderka [[en:User:Svick]]
On Fri, Dec 28, 2012 at 3:32 PM, Ewa Szwed ewaszynal@gmail.com wrote:
Hi, I am currently working on a project which involves using the wikipedia content. The expected traffic that our system needs to serve is around
200
qps during peak time. My question is if using Media WIKI directly is really my option here (I
mean
sending get requests to http://en.wikipedia.org/w/api.php ridectly
without
any local mirror). Would this traffic be supported or banned? Also what about the availability of the service and possible latencies? If it was banned what is the best approach? Thanks in advance for any answer. Ewa Szwed _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
mediawiki-api@lists.wikimedia.org