Hi folks --
I'm building a proof-of-concept application that does some work on the Wikipedia data set. I was excited to see the announcement of the new API since it would much simplify things for me.
However, as one recent poster pointed out, what is and isn't acceptable usage isn't particularly clear. I'd expect once I put up announce the demo that things might hit (complete guesstimation) in the ballpark of 10k hits per day for a couple days and then probably dropping off to a few hundred a day. Given that Wikipedia averages 30-50k requests per second, it seems that such usage would probably be rounding error compared to Wikipedia's load. I'd cache requests that had already come across on my server for speed / load reasons.
But what I'd like to avoid is building this nifty demo, announcing it a few places and then getting the plug pulled on it. In the case of you know it accidentally becoming The Next Big Thing, I'd naturally move over to a DB dump hosted elsewhere. For clarity, my project doesn't have the goal of being a Wikipedia mirror, the demo is just to show how the software works on a big data set.
What would even be fine from my side would be just a heads up from somebody at WP if we're pissing them off, so that we could rework things within a couple days to use a dump.
Is there a policy on acceptable usage anywhere? I get the feeling from a similar question this week that this may be a frequent question.
Cheers,
- [[User:Scott.wheeler|Scott]]