Hello David!
Your project seems to be very interesting, could you elaborate a bit more?
So much thank you! I will definitely be happy to elaborate more on it via a skype call: I could share the screen and show what I m boiling in the pot :D
Back to your reply now:
Yes, I was mainly testing during time both Europe and USA are connected. However, I am experiencing this type of delay from my laptop; maybe on deployment will speed up cause is my home network creepy?
I am concerned because I need to first fetch results from Wikipedia, then elaborate with my own data (that is fast enough <200ms) and then push it to the client. That is the reason of why I will put it server side and not client-side.
I need search generator only as *first entry point*: imagine you need to search for a topic, but you don't know exactly what. Imagine an input form, you type in some keywords, select one among results, and then you start your session.
I cannot estimate exactly the amount of FST query I need; let's say each user will need a search generator only once per session.
Maybe 30 user per seconds concurrent would be a good reference (it 's same number Parse of Facebook provide, Firebase up to 100... so maybe I could relay on similar order of magnitude...)
If I can provide people with a smooth user experience on search, that will be interesting because I could free resources up : I may extend a test of knowledge discovery to other languages, too. If the first user experience was too slow (~1.3s + bandwith transmission ~1.5+ per query) that could become critical.
I don't need search generator to operate in batch, or to track changes. It just serve the user to find a topic as entry point for discovery. I cannot use 'Opensearch' because it does not provide _IDs ; also, it searches against titles only.
Would it be possible to reserve somehow bandwith or requests for a domain?
On Wed, Dec 23, 2015 at 3:55 PM, David Causse dcausse@wikimedia.org wrote:
Le 22/12/2015 18:28, Luigi Assom a écrit :
I tested it from my laptop, and I found it quite slow; as example, it took:
~1.2 seconds for querying 'DNA'
~1.6 s for 'terroristi attacks'
~1.7s for 'biology technology'
For a single word query on english wikipedia this is more like 400ms for me, so I'm not sure to understand why you experienced such response times. Response times may vary depending on server load but I'm surprised you noticed more than 1 sec for simple queries like that. Did you check that you are receiving the result type/format you expect (i.e. format=json ) ? Could you re-check at different times of the day, servers may be busy around 8pm CET (time when both europe and america are active).
Your project seems to be very interesting, could you elaborate a bit more? Do you plan to use the api from a backend/automata which will need to send a lot of queries, do you have an estimation on your needs (number of queries and refresh rate)? If your process is like refreshing a set of queries regularly I'd suggest you build a daemon that send few queries (3 or 4) per minute rather than an aggressive batch with parallel processes run once a day/week/month. You should have a look at RCStream[1] which may be more appropriate to your needs (if you plan to track changes it's definitely better than refreshing the same set of queries regularly)
Thank you!
[1] https://wikitech.wikimedia.org/wiki/RCStream
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery