Hello,
I am writing about limits of use and etiquette to comply with for consuming
API for full-text search *server side*.
I am building a site for visualization and knowledge discovery of
wikipedias.
It will be a personal funded project (at least initially!), for public
use: investing more in indexing under Elastic Search would be beyond my
possibilities and also beyond the scope of my project - focus is on
visualization and discovery. And I also think there is no need to reinvent
the wheel :)
I want to figure out a best setup for usability and rate requests for of
full-text search API, complying with your policy.
Would you please take a minute to read below?
***
Currently my set up makes use of my own db: for full text search I use
elastic search at a very basic level.
I then use Wikipedia API for decoration of my data, *client-side (AJAX).*
Despite slower than what I have now, Wikipedia full-text api are much more
useful for a user.
It offer results on complex queries that I cannot provide, for I am
indexing only articles' titles.
I would like to include full-text search against WikiMedia API from server
side.
I want to ensure that I can meet policy of wikimedia foundation, if I will
make concurrent requests on behalf of users.
- *Are there any limit to the number of request I can do from a web
domain?*
I would like to use wikitool python library.
The query I need to run will use a *search generator *over article
namespace only:
action=query&*generator=search*&gsrnamespace=0&gsrsearch='my query'&
gsrlimit=20
I tested it from my laptop, and I found it quite slow; as example, it took:
~1.2 seconds for querying 'DNA'
~1.6 s for 'terroristi attacks'
~1.7s for 'biology technology'
and I am currently on a very fast wifi network.
-
*How would it be possible to improve performance? *
- *Is it possible to apply for a desired rate of requests?*
I also read it would be a good etiquette practice to specify in *headers*
contacts, in case you need to communicate with the domain. It is not clear
to me what I should do.
- *Could you please indicate how to do it with an example in python
(here using flask framework)?*
Thank you very much for your help,
Luigi