Le 28/07/2015 16:32, Trey Jones a écrit :
Nemo recommended insource: to Lagotto because it would actually work and do what they want, but didn't consider the computational cost on our end. However, if we only allow 20 at a time, they would probably monopolize it entirely. In my sample we got about 50,000 of these queries in about an hour.
David/Chad, can you look at Nemo's issue and comment there on what's plausible and what's not? https://github.com/lagotto/lagotto/issues/405
I added a comment there.
Also, is this the kind of use case that we want to support? I'm not suggesting that it isn't, I really don't know. But they aren't looking for information, they are looking for something akin to impact factor on reputable parts of the web. If that's not something we want to support, how do we let them know? If that doesn't help—e.g., because it's some other installation using their tool that's generating all the queries—do we block it?
I don't know what to do with this, they use our search engine as a workaround because I guess they don't want to deal with too much data and it's pretty convenient to send a query on a system that do not blacklist anyone. I they were using google they would have been able to run something like 1 query per minute.
We should block/limit a source if : - It hurts the system and make the search experience bad for others - It pollutes our stats in a way that it's impossible for us to learn anything from search logs
When we'll start to do some statistical machine learning this is something that we will have to address.
Concerning the costly operators, if other tools/sources start to use them in a way that affect the system performance I'm afraid we will have to make these expert features protected by some permissions granted by wiki admins.