Do you use our search API? If so, I'd like to hear from you!
The Discovery Department https://wikimediafoundation.org/wiki/Staff_and_contractors#Discovery at the Wikimedia Foundation is tasked with building a path of discovery to relevant and trusted knowledge. In line with that, one of our primary responsibilities is to ensure that our search APIs are stable, fast, and easy to use. We'd love to hear from the people that are using our APIs, so we can learn what you love about them, what frustrates you, and what we can do to improve them for you.
I'd prefer that you keep the comments about the API itself rather than the relevance of the results it returns; I plan to start a separate thread about the result relevance, since they're separate topics.
If you have some feedback, please reply in this thread or reach out to me privately.
Thanks!
Dan
On 6/8/15, Dan Garry dgarry@wikimedia.org wrote:
Do you use our search API? If so, I'd like to hear from you!
The Discovery Department https://wikimediafoundation.org/wiki/Staff_and_contractors#Discovery at the Wikimedia Foundation is tasked with building a path of discovery to relevant and trusted knowledge. In line with that, one of our primary responsibilities is to ensure that our search APIs are stable, fast, and easy to use. We'd love to hear from the people that are using our APIs, so we can learn what you love about them, what frustrates you, and what we can do to improve them for you.
I'd prefer that you keep the comments about the API itself rather than the relevance of the results it returns; I plan to start a separate thread about the result relevance, since they're separate topics.
If you have some feedback, please reply in this thread or reach out to me privately.
Thanks!
Dan
-- Dan Garry Product Manager, Discovery Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
The search api (by which I mean query=search in api.php) is somewhat poorly documented. You have to dig to find https://www.mediawiki.org/wiki/Help:CirrusSearch . I would much prefer that the relavent documentation was including in the normal api.php auto-generated help. Even better would be if that api allowed users to specify the options using normal url parameters, (as a separate options from using operators in the search string). Its also not entirely the most clear from the api that the search options differ depending on which extensions you have installed.
Additionally, from the help page, its not entirely clear about some of the limitations. e.g. You can't do incategory:Foo OR intitle:bar. regexes on intitle don't seem to work over the whole title, only word level tokens (I think, maybe? I'm a bit unclear on how the regex operator works).
Cheers, Brian
On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff bawolff@gmail.com wrote:
The search api (by which I mean query=search in api.php) is somewhat poorly documented. You have to dig to find https://www.mediawiki.org/wiki/Help:CirrusSearch .
I recently added https://www.mediawiki.org/wiki/API:Search_and_discovery which clarifies the connection with Help:CirrusSearch, and mentions other kinds of searching like geosearch.
I would much prefer that the relavent documentation was including in the normal api.php auto-generated help.
https://gerrit.wikimedia.org/r/216899 changes the 'apihelp-query+search-param-search message' in https://www.mediawiki.org/wiki/Special:ApiHelp/query+search to *srsearch*
Search for page titles and page content that match this value. You can use the search string to invoke special wiki search features, depending on what its search backend implements. But API query search can only use CirrusSearch features if it's installed. I think Extension:CirrusSearch could handle the 'APIGetAllowedParams' hook to modified this help text. If I understand correctly, it might be easier to interpose WMF-specific help text that links to mw:Help:CirrusSearch in a 'wikimedia-apihelp-query+search-param-search' key in extensions/WikimediaMessages/i18n/wikimediaoverrides/en.json ; I tried it locally and it didn't work.
Even better would be if that api allowed users to specify the options using normal url parameters, (as a separate options from using operators in the search string). Its also not entirely the most clear from the api that the search options differ depending on which extensions you have installed.
What do you mean? Beyone special terms in srsearch I'm not aware of any changes to query+search's sr parameters depending on extensions.
Additionally, from the help page, its not entirely clear about some of the limitations. e.g. You can't do incategory:Foo OR intitle:bar. regexes on intitle don't seem to work over the whole title, only word level tokens (I think, maybe? I'm a bit unclear on how the regex operator works).
Yes it's not a full reference.
On 6/8/15, S Page spage@wikimedia.org wrote:
On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff bawolff@gmail.com wrote:
The search api (by which I mean query=search in api.php) is somewhat poorly documented. You have to dig to find https://www.mediawiki.org/wiki/Help:CirrusSearch .
I recently added https://www.mediawiki.org/wiki/API:Search_and_discovery which clarifies the connection with Help:CirrusSearch, and mentions other kinds of searching like geosearch.
Last I looked at the docs was about 6 months ago. Glad to hear they're improving.
I would much prefer that the relavent documentation was including in the normal api.php auto-generated help.
https://gerrit.wikimedia.org/r/216899 changes the 'apihelp-query+search-param-search message' in https://www.mediawiki.org/wiki/Special:ApiHelp/query+search to *srsearch*
Search for page titles and page content that match this value. You can use the search string to invoke special wiki search features, depending on what its search backend implements. But API query search can only use CirrusSearch features if it's installed. I think Extension:CirrusSearch could handle the 'APIGetAllowedParams' hook to modified this help text. If I understand correctly, it might be easier to interpose WMF-specific help text that links to mw:Help:CirrusSearch in a 'wikimedia-apihelp-query+search-param-search' key in extensions/WikimediaMessages/i18n/wikimediaoverrides/en.json ; I tried it locally and it didn't work.
It shouldn't be WMF specific (since its not WMF specific like TOS links), it should be specific to CirrusSearch.
One possible implementation would be to do an override message (I would note, that the wikimediaoverride messages aren't direct overrides, they are replacement messages used by other code that does the overriding). In my original email I was thinking more from a user perspective of what I'd like to see, without thought to how it would be implemented. Without looking at the code, I would probably favour an extra hook just for the search module, instead of using the generic hook.
Even better would be if that api allowed users to specify the options using normal url parameters, (as a separate options from using operators in the search string). Its also not entirely the most clear from the api that the search options differ depending on which extensions you have installed.
What do you mean? Beyone special terms in srsearch I'm not aware of any changes to query+search's sr parameters depending on extensions.
Yeah, that doesn't happen currently. I think it should be the case, it would mesh much better with the mediawiki api if instead of doing https://commons.wikimedia.org/w/api.php?action=query&list=search&srs... you could do something like https://commons.wikimedia.org/w/api.php?action=query&list=search&sri... . Especially if all the parameters were documented in the normal api way, I think it would represent a big boon to discovering the hidden features of search. (I appreciate it might be a lot of work to express all the search options possible, but the original email sounded like it wanted a wishlist).
-- bawolff
On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff bawolff@gmail.com wrote:
Additionally, from the help page, its not entirely clear about some of the limitations. e.g. You can't do incategory:Foo OR intitle:bar. regexes on intitle don't seem to work over the whole title, only word level tokens (I think, maybe? I'm a bit unclear on how the regex operator works).
Being able to see a parse tree of the search expression would be nice, like with the parse/expandtemplates APIs. That would make it easier to find out whether the search fails because the query is parsed differently from what you imagined, or because there really is nothing to return.
On Tue, Jun 9, 2015 at 2:19 AM, Gergo Tisza gtisza@wikimedia.org wrote:
On Mon, Jun 8, 2015 at 4:16 PM, Brian Wolff bawolff@gmail.com wrote:
Additionally, from the help page, its not entirely clear about some of the limitations. e.g. You can't do incategory:Foo OR intitle:bar. regexes on intitle don't seem to work over the whole title, only word level tokens (I think, maybe? I'm a bit unclear on how the regex operator works).
Being able to see a parse tree of the search expression would be nice, like with the parse/expandtemplates APIs. That would make it easier to find out whether the search fails because the query is parsed differently from what you imagined, or because there really is nothing to return.
You can _kindof_ get that now by adding the cirrusDumpQuery url parameter. But it only dumps the query as sent by Cirrus to Elasticsearch and that contains a query_string query that Elasticsearch (Lucene really) parses on its own.
One interesting option would be to make a way for Cirrus to return Elasticsearch's explain results. Its not perfect because it only explains why things are found and scored the way they are but it doesn't explain why things aren't found. Exporting the actual parsed query is more ambitious.
Nik
On Mon, Jun 8, 2015 at 7:16 PM, Brian Wolff bawolff@gmail.com wrote:
You can't do incategory:Foo OR intitle:bar. regexes on intitle don't seem to work over the whole title, only word level tokens (I think, maybe? I'm a bit unclear on how the regex operator works).
intitle is word level though you can do phrase searching. Its pretty much the same as a regular search but limited to the title field. incategory:Foo OR intitle:Bar is a limitation I'm working on now. No idea when it'll be avilable. Limitation comes from us trying to be cute with the command parsing in Cirrus and not writing a whole grammar for the query language. Regexes only work for wikitext. This is a somewhat arbitrary decision on my part - we need to made special ngram fields to accelerate the regex searching and we only do that for wikitext. We _can_ do it for other fields at the cost of update time and disk space.
Nik
Dan Garry wrote:
In line with that, one of our primary responsibilities is to ensure that our search APIs are stable, fast, and easy to use. We'd love to hear from the people that are using our APIs, so we can learn what you love about them, what frustrates you, and what we can do to improve them for you.
I have two recurring thoughts about search lately, since you asked.
First, multimedia search is absolutely horrible, basically non-existent. If you go to Wikimedia Commons and try its search functionality and then compare to any other media service on the Internet, you can quickly come up with a list of a dozen features that are missing (search by file size, by color, by image file format, etc.).
Second, Wikimedia still hasn't aggregated and released anonymized search data. People use Special:Search daily and they encounter a page of search results instead of having a redirect take them to the appropriate destination. Or sometimes worse there's no coverage at all of what our users are searching for. It's a long tail, yes, but we could start filling in gaps if we had data about what users are looking for. We could save users a lot of time and build better sites by analyzing what users are looking for and not finding or what they're looking for and not immediately being redirected toward. And yes, of course, there are privacy considerations (the infamous AOL case, &c.), but nothing insurmountable.
Beyond these two points, it's vitally important that we able to arbitrarily query Wikidata soon. I'm hoping this functionality is live on Wikimedia wikis by the end of 2015. And speaking to APIs specifically, we really need to focus on projects such as Wiktionary and Wikisource that are desperately in need of API support to serialize and add structure to what is currently very fragile blobs of wikitext markup.
MZMcBride
P.S. RIP, SAD.
On Wed, 2015-06-10 at 02:01 -0400, MZMcBride wrote:
a list of a dozen features that are missing (search by file size, by color, by image file format, etc.).
Also see https://phabricator.wikimedia.org/T101089 and https://phabricator.wikimedia.org/T101087
On Wed, Jun 10, 2015 at 8:01 AM, MZMcBride z@mzmcbride.com wrote:
I have two recurring thoughts about search lately, since you asked.
First, multimedia search is absolutely horrible, basically non-existent. If you go to Wikimedia Commons and try its search functionality and then compare to any other media service on the Internet, you can quickly come up with a list of a dozen features that are missing (search by file size, by color, by image file format, etc.).
To really make this awesome we need structured data support for Commons with Wikidata. We'll be making more progress on it in the second half of this year but there is a lot to do.
<snip>
Beyond these two points, it's vitally important that we able to arbitrarily query Wikidata soon. I'm hoping this functionality is live on Wikimedia wikis by the end of 2015. And speaking to APIs specifically, we really need to focus on projects such as Wiktionary and Wikisource that are desperately in need of API support to serialize and add structure to what is currently very fragile blobs of wikitext markup.
Please give feedback on the latest proposal for Wikidata support for Wiktionary: https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/2015...
Cheers Lydia
To really make this awesome we need structured data support for Commons with Wikidata. We'll be making more progress on it in the second half of this year but there is a lot to do.
Sure, to really make that awsome, yeah you need wikidata. But we are far away from hitting the point where we need wikidata. In fact the three examples McBride gave don't need wikidata. mime type and file size are easily programmaticly available already. And unless I'm mistaken, functionally dependent metadata like algortihmically determined main image colour, are out of scope of wikidata.
--bawolff
On Wed, Jun 10, 2015 at 7:36 PM, Brian Wolff bawolff@gmail.com wrote:
To really make this awesome we need structured data support for Commons with Wikidata. We'll be making more progress on it in the second half of this year but there is a lot to do.
Sure, to really make that awsome, yeah you need wikidata. But we are far away from hitting the point where we need wikidata. In fact the three examples McBride gave don't need wikidata. mime type and file size are easily programmaticly available already.
Yeah of course.
And unless I'm mistaken, functionally dependent metadata like algortihmically determined main image colour, are out of scope of wikidata.
We've been thinking about this a bit but no decision has been made. It'd be nice to make these accessible in the same way as other properties without needing to store and maintain them the same way. We've been thinking about some kind of fake properties for example. But we'll worry about that when we get there. We're getting a bit off-topic. Sorry, Dan.
Cheers Lydia
wikitech-l@lists.wikimedia.org