Hello David!
Your project seems to be very interesting, could you elaborate a bit more?
So much thank you!
I will definitely be happy to elaborate more on it via a skype call: I
could share the screen and show what I m boiling in the pot :D
Back to your reply now:
Yes, I was mainly testing during time both Europe and USA are connected.
However, I am experiencing this type of delay from my laptop; maybe on
deployment will speed up cause is my home network creepy?
I am concerned because I need to first fetch results from Wikipedia, then
elaborate with my own data (that is fast enough <200ms) and then push it to
the client. That is the reason of why I will put it server side and not
client-side.
I need search generator only as *first entry point*: imagine you need to
search for a topic, but you don't know exactly what. Imagine an input form,
you type in some keywords, select one among results, and then you start
your session.
I cannot estimate exactly the amount of FST query I need; let's say each
user will need a search generator only once per session.
Maybe 30 user per seconds concurrent would be a good reference (it 's same
number Parse of Facebook provide, Firebase up to 100... so maybe I could
relay on similar order of magnitude...)
If I can provide people with a smooth user experience on search, that will
be interesting because I could free resources up : I may extend a test of
knowledge discovery to other languages, too.
If the first user experience was too slow (~1.3s + bandwith transmission
~1.5+ per query) that could become critical.
I don't need search generator to operate in batch, or to track changes.
It just serve the user to find a topic as entry point for discovery.
I cannot use 'Opensearch' because it does not provide _IDs ; also, it
searches against titles only.
Would it be possible to reserve somehow bandwith or requests for a domain?
On Wed, Dec 23, 2015 at 3:55 PM, David Causse <dcausse(a)wikimedia.org> wrote:
Le 22/12/2015 18:28, Luigi Assom a écrit :
I tested it from my laptop, and I found it quite
slow; as example, it
took:
~1.2 seconds for querying 'DNA'
~1.6 s for 'terroristi attacks'
~1.7s for 'biology technology'
For a single word query on english wikipedia this is more like 400ms for
me, so I'm not sure to understand why you experienced such response times.
Response times may vary depending on server load but I'm surprised you
noticed more than 1 sec for simple queries like that.
Did you check that you are receiving the result type/format you expect
(i.e. format=json ) ?
Could you re-check at different times of the day, servers may be busy
around 8pm CET (time when both europe and america are active).
Your project seems to be very interesting, could you elaborate a bit more?
Do you plan to use the api from a backend/automata which will need to send
a lot of queries, do you have an estimation on your needs (number of
queries and refresh rate)?
If your process is like refreshing a set of queries regularly I'd suggest
you build a daemon that send few queries (3 or 4) per minute rather than an
aggressive batch with parallel processes run once a day/week/month.
You should have a look at RCStream[1] which may be more appropriate to
your needs (if you plan to track changes it's definitely better than
refreshing the same set of queries regularly)
Thank you!
[1]
https://wikitech.wikimedia.org/wiki/RCStream
_______________________________________________
discovery mailing list
discovery(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/discovery