Hi,
Thanks for your answer. The database dumps and setting up the mirror is my second option I guess.
Before exploring that I would like to make sure that using API directly is not an option due to the traffic. Or is it?
The system being developed is read only.
Characteristics below:
LOOKUP
* take a search term, return "exact" match article
response will likely need to include the title, snippet, thumbnail and URL
SEARCH
* take a search term; return relevant articles
likely only return the most relevant article, and preferably only articles that exceed a specific relevancy score
response will likely need to include the title, snippet, thumbnail and URL
ARTICLE GET
take a article ID, return contents of articleYou didn't say what kind of queries are you planing to do, but
wouldn't the database dumps [1] be enough for you?
Petr Onderka
[[en:User:Svick]]
[1]: http://en.wikipedia.org/wiki/Wikipedia:Database_download
> _______________________________________________
On Fri, Dec 28, 2012 at 3:32 PM, Ewa Szwed <ewaszynal@gmail.com> wrote:
> Hi,
> I am currently working on a project which involves using the wikipedia
> content. The expected traffic that our system needs to serve is around 200
> qps during peak time.
> My question is if using Media WIKI directly is really my option here (I mean
> sending get requests to http://en.wikipedia.org/w/api.php ridectly without
> any local mirror). Would this traffic be supported or banned?
> Also what about the availability of the service and possible latencies?
> If it was banned what is the best approach?
> Thanks in advance for any answer.
> Ewa Szwed
> Mediawiki-api mailing list
> Mediawiki-api@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
_______________________________________________
Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api