We have a new feature for web requests that rewrites zero result queries into a new search that might have results. I've started porting this same feature over to API clients so it has a larger effect on our zero results rate, but code review has turned up some indecision on if this should be enabled or disabled by default in the API. Either way the feature will be toggleable.
I thought we should open this up to a larger audience, are there any opinions?
Erik B.
Probably a good idea. Is it opt-in or opt-our for the API consumer?
-Adam
On Thu, Jul 30, 2015 at 2:06 PM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
We have a new feature for web requests that rewrites zero result queries into a new search that might have results. I've started porting this same feature over to API clients so it has a larger effect on our zero results rate, but code review has turned up some indecision on if this should be enabled or disabled by default in the API. Either way the feature will be toggleable.
I thought we should open this up to a larger audience, are there any opinions?
Erik B.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
On Thu, Jul 30, 2015 at 3:29 PM, Adam Baso abaso@wikimedia.org wrote:
Probably a good idea. Is it opt-in or opt-our for the API consumer?
-Adam
Thats the main question :) To copy the dialog from gerrit:
*Anomie wrote:* Should the ability to set this flag be exposed to the API somehow?
Or, to avoid changing things for clients, perhaps that should be "should
the
ability to not set this flag be exposed to the API somehow?" since the
API
hasn't done any query rewriting before?
*Ebernhardson wrote:* As for the default, The current scope of rewriting is very small, we are
only
applying it to situations where the original query returned no results
and the
original query didn't contain any special syntax(such as quotes,
incategory: etc).
We might investigate doing heavier rewriting in the future (hasn't been considered yet), but for this quarter we are focusing specifically on
returning
answers to queries that return no results.
As this only effects queries that didn't have a result anyways, i think
it should
be safe to apply as a default and allow api consumers to opt out.
*Anomie wrote:* OTOH, someone might be running a bot that searches for a common typo and fixes it. Once all the instances of the typo are fixed, your rewriting
might cause
it to return search results for the un-typoed version, causing the bot to
at best
waste bandwidth by fetching many pages that don't need any edit (and in
worse
cases it might make annoying minor edits or otherwise misbehave).
I thought it might be useful to get more opinions than our two.
Erik B.
There are a multitude of API consumers out there that would expect this kind of behaviour by default. For example, reader apps like our native apps, and other third party apps, would likely prefer the forwarding to happen automatically without having to write additional code.
That said, there are also a multitude of API consumers that would not prefer this kind of behaviour. Examples of this include people that are looking for things and expecting to find zero results, which includes a lot of scripts run by advanced users and external services like Lagotto http://sample.lagotto.io/sources/wikipedia. If the default were changed, they would have to write additional code to handle it.
I'm prepared to make a decision as product owner, but input would definitely be appreciated.
Thanks, Dan
On 30 July 2015 at 14:06, Erik Bernhardson ebernhardson@wikimedia.org wrote:
We have a new feature for web requests that rewrites zero result queries into a new search that might have results. I've started porting this same feature over to API clients so it has a larger effect on our zero results rate, but code review has turned up some indecision on if this should be enabled or disabled by default in the API. Either way the feature will be toggleable.
I thought we should open this up to a larger audience, are there any opinions?
Erik B.
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
To summarize Dan's comments: damned if you do, damned if you don't.
I lean more toward helping the less sophisticated user by default, and trusting the more sophisticated user to do the right thing. I'd say bot creators are more sophisticated than most human users, who are more sophisticated than most bots.
So, if the community of bot creators is generally responsive to changes to the API and default behavior, and this change is well communicated beforehand, they should be able to adapt as needed. On the other hand, if there are lots of orphaned bots out there doing good work, we might break some of them. On the other other hand, someone should find those bots new homes.
I also feel like humans users expect and deserve the best results we can give them, so I lean more towards making the change the default.
A more complex and more general approach might be to version the API... though that's a whole 'nother can of worms. But it would allow API users to expect a consistent set of features for the life of the version, and would allow them to upgrade to and adapt to the new version over some time period rather than having to scramble (or just suffer breakage) the day it goes live. This would require serious thinking about what it means to be a version (is it just default flags, or do old versions get or not get feature upgrades), how to indicate versions (including "always the most recent API" and "I'm from the time before API versions"), how long versions live, how many we support concurrently, how to fallback from unsupported versions, etc., etc. But Dan seems like he needs more hobbies, so I'm throwing it out there.
—Trey
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Thu, Jul 30, 2015 at 6:38 PM, Dan Garry dgarry@wikimedia.org wrote:
There are a multitude of API consumers out there that would expect this kind of behaviour by default. For example, reader apps like our native apps, and other third party apps, would likely prefer the forwarding to happen automatically without having to write additional code.
That said, there are also a multitude of API consumers that would not prefer this kind of behaviour. Examples of this include people that are looking for things and expecting to find zero results, which includes a lot of scripts run by advanced users and external services like Lagotto http://sample.lagotto.io/sources/wikipedia. If the default were changed, they would have to write additional code to handle it.
I'm prepared to make a decision as product owner, but input would definitely be appreciated.
Thanks, Dan
On Fri, Jul 31, 2015 at 9:13 AM, Trey Jones tjones@wikimedia.org wrote:
So, if the community of bot creators is generally responsive to changes to the API and default behavior,
Some are, many aren't. Some bots are run by people who barely understand the code, because they took it over from a now-retired user—this is the result of "find[ing] those bots new homes". Others are run by people who are still active in a sense but lack the time or motivation to make many changes.
User scripts and gadgets can be in an even worse situation, as the original maintainer may have retired leaving *no* replacement while others continue to use the script.
I also feel like humans users expect and deserve the best results we can give them, so I lean more towards making the change the default.
In a sense there are no "human users" of the API. It's always programs of some sort using it to fetch data to then reformat for display to humans.
In a different sense, of course, there are human users, but not the ones you're probably thinking of. These users are the programmers who create these programs. But I don't think this change would be as useful as the recent continuation change, and particularly I don't think it's useful enough to justify the amount of work that would be required to properly communicate it.
A more complex and more general approach might be to version the API... though that's a whole 'nother can of worms.
See T41592, particularly the reasons given for declining it.
On Thu, Jul 30, 2015 at 6:38 PM, Dan Garry dgarry@wikimedia.org wrote:
There are a multitude of API consumers out there that would expect this kind of behaviour by default. For example, reader apps like our native apps, and other third party apps, would likely prefer the forwarding to happen automatically without having to write additional code.
That said, there are also a multitude of API consumers that would not prefer this kind of behaviour. Examples of this include people that are looking for things and expecting to find zero results, which includes a lot of scripts run by advanced users and external services like Lagotto http://sample.lagotto.io/sources/wikipedia. If the default were changed, they would have to write additional code to handle it.
Note the "additional code" in either case is simply adding an extra parameter to the query to enable or disable automatic rewriting.
The difference is that the current behavior is "don't rewrite": if you make the default be "rewrite" then you're breaking all the existing clients in the second paragraph, while if you make the default be "don't rewrite" then all clients keep working as they did previously (which isn't optimal for the clients in the first paragraph, but they're not *broken* and they might already be using the returned 'suggestion' field to do rewriting manually).
On 07/30/2015 02:06 PM, Erik Bernhardson wrote:
results rate, but code review has turned up some indecision on if this should be enabled or disabled by default in the API. Either way the feature will be toggleable.
What gerrit change is this about? Can you provide a link? :)
-- Legoktm
On Thu, Jul 30, 2015 at 4:51 PM, Legoktm legoktm.wikipedia@gmail.com wrote:
On 07/30/2015 02:06 PM, Erik Bernhardson wrote:
results rate, but code review has turned up some indecision on if this should be enabled or disabled by default in the API. Either way the feature will be toggleable.
What gerrit change is this about? Can you provide a link? :)
The core change is https://gerrit.wikimedia.org/r/#/c/227501/ There is a related cirrus change at https://gerrit.wikimedia.org/r/#/c/227578/