Hello,
I am writing about limits of use and etiquette to comply with for consuming API for full-text search server side.
I am building a site for visualization and knowledge discovery of wikipedias.
It will be a personal funded project (at least initially!), for public use: investing more in indexing under Elastic Search would be beyond my possibilities and also beyond the scope of my project - focus is on visualization and discovery. And I also think there is no need to reinvent the wheel :)
I want to figure out a best setup for usability and rate requests for of full-text search API, complying with your policy.
Would you please take a minute to read below?
***
Currently my set up makes use of my own db: for full text search I use elastic search at a very basic level.
I then use Wikipedia API for decoration of my data, client-side (AJAX).
Despite slower than what I have now, Wikipedia full-text api are much more useful for a user.
It offer results on complex queries that I cannot provide, for I am indexing only articles' titles.
I would like to include full-text search against WikiMedia API from server side.
I want to ensure that I can meet policy of wikimedia foundation, if I will make concurrent requests on behalf of users.
- Are there any limit to the number of request I can do from a web domain?
I would like to use wikitool python library.
The query I need to run will use a search generator over article namespace only:
action=query&generator=search&gsrnamespace=0&gsrsearch='my query'&gsrlimit=20
I tested it from my laptop, and I found it quite slow; as example, it took:
~1.2 seconds for querying 'DNA'
~1.6 s for 'terroristi attacks'
~1.7s for 'biology technology'
and I am currently on a very fast wifi network.
- How would it be possible to improve performance?
- Is it possible to apply for a desired rate of requests?
I also read it would be a good etiquette practice to specify in headers contacts, in case you need to communicate with the domain. It is not clear to me what I should do.
- Could you please indicate how to do it with an example in python (here using flask framework)?
Thank you very much for your help,
Luigi