Hello,

I am writing about limits of use and etiquette to comply with for consuming API for full-text search server side.

I am building a site for visualization and knowledge discovery of wikipedias.

It will be a personal funded project (at least initially!),  for public use: investing more in indexing under Elastic Search would be beyond my possibilities and also beyond the scope of my project - focus is on visualization and discovery. And I also think there is no need to reinvent the wheel :)

I want to figure out a best setup for usability and rate requests for of full-text search API, complying with your policy. 

Would you please take a minute to read below?


***

Currently my set up makes use of my own db: for full text search I use elastic search at a very basic level.
I then use Wikipedia API for decoration of my data, client-side (AJAX).

Despite slower than what I have now, Wikipedia full-text api are much more useful for a user.
It offer results on complex queries that I cannot provide, for I am indexing only articles' titles.

I would like to include full-text search against WikiMedia API from server side.
I want to ensure that I can meet policy of wikimedia foundation, if I will make concurrent requests on behalf of users.

  • Are there any limit to the number of request I can do from a web domain?

I would like to use wikitool python library.
The query I need to run will use a search generator over article namespace only:

action=query&generator=search&gsrnamespace=0&gsrsearch='my query'&gsrlimit=20

I tested it from my laptop, and I found it quite slow; as example, it took:

~1.2 seconds for querying 'DNA'

~1.6 s for 'terroristi attacks'

~1.7s for 'biology technology'

and I am currently on a very fast wifi network.

  • How would it be possible to improve performance?
  • Is it possible to apply for a desired rate of requests?

I also read it would be a good etiquette practice to specify in headers contacts, in case you need to communicate with the domain. It is not clear to me what I should do. 

  • Could you please indicate how to do it with an example in python (here using flask framework)?


Thank you very much for your help,
Luigi