Today we threw the big lever and turned on our new search backend at mediawiki.org. It isn't the default yet but it is just about ready for you to try. Here is what is we think we've improved: 1. Templates are now expanded during search so: 1a. You can search for text included in templates 1b. You can search for categories included in templates 2. The search engine is updated very quickly after articles change. 3. A few funky things around intitle and incategory: 3a. You can combine them with a regular query (incategory:kings peaceful) 3b. You can use prefix searches with them (incategory:norma*) 3c. You can use them everywhere in the query (roger incategory:normans)
What we think we've made worse and we're working on fixing: 1. Because we're expanding templates some things that probably shouldn't be searched are being searched. We've fixed a few of these issues but I wouldn't be surprised if more come up. We opened Bug 53426 regarding audio tags. 2. The relative weighting of matches is going to be different. We're still fine tuning this and we'd appreciate any anecdotes describing search results that seem out of order. 3. We don't currently index headings beyond the article title in any special way. We'll be fixing that soon. (Bug 53481) 4. Searching for file names or clusters of punctuation characters doesn't work as well as it used to. It still works reasonably well if you surround your query in quotes but it isn't as good as it was. (Bugs 53013 and 52948) 5. "Did you mean" suggestions currently aren't highlighted at all and sometimes we'll suggest things that aren't actually better. (Bugs 52286 and 52860) 6. incategory:"category with spaces" isn't working. (Bug 53415)
What we've changed that you probably don't care about: 1. Updating search in bulk is much more slow then before. This is the cost of expanding templates. 2. Search is now backed by a horizontally scalable search backend that is being actively developed (Elasticsearch) so we're in a much better place to expand on the new solution as time goes on.
Neat stuff if you run your own MediaWiki: CirrusSearch is much easier to install than our current search infrastructure.
So what will you notice? Nothing! That is because while the new search backend (CirrusSearch) is indexing we've left the current search infrastructure as the default while we work on our list of bugs. You can see the results from CirrusSearch by performing your search as normal and adding "&srbackend=CirrusSearch" to the url parameters.
If you notice any problems with CirrusSearch please file bugs directly for it: https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions&...
Nik Everett
On 08/28/2013 02:21 PM, Nikolas Everett wrote:
- A few funky things around intitle and incategory:
3a. You can combine them with a regular query (incategory:kings peaceful) 3b. You can use prefix searches with them (incategory:norma*) 3c. You can use them everywhere in the query (roger incategory:normans)
I especially want to point out here that, with the new search, we can do "category intersection" searches, which people have wanted FOR YEARS. Just as an example picked at random, you could search for "incategory:women" and "incategory:novelists" and "incategory:American". Just to pick an example at random. ;-) https://en.wikipedia.org/wiki/Wikipedia:Categories_for_discussion/Log/2013_April_24#Category:American_women_novelists
This is really exciting but we want to make sure it works right before we roll it out to our bigger sites! So please do try it ou. On test2.wikipedia.org you can copy templates and articles from your home wiki and then run the search to see if things work okay.
On 08/28/2013 07:24 PM, Sumana Harihareswara wrote:
I especially want to point out here that, with the new search, we can do "category intersection" searches, which people have wanted FOR YEARS. Just as an example picked at random, you could search for "incategory:women" and "incategory:novelists" and "incategory:American". Just to pick an example at random. ;-) https://en.wikipedia.org/wiki/Wikipedia:Categories_for_discussion/Log/2013_April_24#Category:American_women_novelists
While not the same as the "ideal" category intersection (a page that basically looks exactly like a regular category but is dynamic), it's still really cool. :)
Matt Flaschen
Do you have plans of integrating this category intersection into the main search, so that it will be possible to form the search facets? ----- Yury Katkov, WikiVote
On Thu, Aug 29, 2013 at 7:25 AM, Matthew Flaschen mflaschen@wikimedia.org wrote:
On 08/28/2013 07:24 PM, Sumana Harihareswara wrote:
I especially want to point out here that, with the new search, we can do "category intersection" searches, which people have wanted FOR YEARS. Just as an example picked at random, you could search for "incategory:women" and "incategory:novelists" and "incategory:American". Just to pick an example at random. ;-)
While not the same as the "ideal" category intersection (a page that basically looks exactly like a regular category but is dynamic), it's still really cool. :)
Matt Flaschen
Wikitech-ambassadors mailing list Wikitech-ambassadors@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors
On Thu, Aug 29, 2013 at 7:56 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Do you have plans of integrating this category intersection into the main search, so that it will be possible to form the search facets?
I don't have plans so much as a notion that we should do it. It'd certainly help me if someone were to propose what kind of faceting would be useful/mock up a UI.
Nik
On 2013-08-28 8:24 PM, "Sumana Harihareswara" sumanah@wikimedia.org wrote:
On 08/28/2013 02:21 PM, Nikolas Everett wrote:
- A few funky things around intitle and incategory:
3a. You can combine them with a regular query (incategory:kings
peaceful)
3b. You can use prefix searches with them (incategory:norma*) 3c. You can use them everywhere in the query (roger incategory:normans)
I especially want to point out here that, with the new search, we can do "category intersection" searches, which people have wanted FOR YEARS. Just as an example picked at random, you could search for "incategory:women" and "incategory:novelists" and "incategory:American". Just to pick an example at random. ;-) <
https://en.wikipedia.org/wiki/Wikipedia:Categories_for_discussion/Log/2013_A...
This is really exciting but we want to make sure it works right before we roll it out to our bigger sites! So please do try it ou. On test2.wikipedia.org you can copy templates and articles from your home wiki and then run the search to see if things work okay. -- Sumana Harihareswara Engineering Community Manager Wikimedia Foundation
Wikitech-ambassadors mailing list Wikitech-ambassadors@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors
Its not that that's new, its that it actually works. (We previously had category intersection of this form, it just didn't include categories from templates since template expansion wasn't done)
I think the community members who want this really want it with a more discoverable form. Maybe once new search is rolled out we should look into making an advanced search interface.
-bawolff
Brian Wolff, 13/09/2013 13:52:
Its not that that's new, its that it actually works. (We previously had category intersection of this form, it just didn't include categories from templates since template expansion wasn't done)
I think the community members who want this really want it with a more discoverable form. Maybe once new search is rolled out we should look into making an advanced search interface.
That's https://bugzilla.wikimedia.org/show_bug.cgi?id=21988 I suppose. You'd probably want an advanced search interface that can be expanded or modified by extensions, so a set of blocking bugs in core and CirrusSearch. Currently extensions can't even change what profile is the default (https://bugzilla.wikimedia.org/show_bug.cgi?id=38395).
Nemo
Awesome update.
Nikolas Everett, 28/08/2013 20:21:
Today we threw the big lever and turned on our new search backend at mediawiki.org http://mediawiki.org. It isn't the default yet but it is just about ready for you to try. [...] 2. The relative weighting of matches is going to be different. We're still fine tuning this and we'd appreciate any anecdotes describing search results that seem out of order.
I'm not very imaginative and I have no idea what queries to test, especially when it comes to language-specific searches (that we probably be able to test some time soon on beta.wmflabs.org language subdomains?). I know from http://laxstrom.name/blag/2012/02/13/exploring-the-states-of-open-source-search-stack-supporting-finnish/ that it's possible to get corpuses of actual search queries. It would be nice to have some extract, or whatever is easy to produce, to have some ideas of stuff to test and improve our anecdotal assessment of the search results quality. Other ideas may come from previous bugs, I guess. :S https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&component=lucene-search-2&component=MWSearch&component=Search
[...] 6. incategory:"category with spaces" isn't working. (Bug 53415)
Does it work on categories not directly mentioned on the page? It may considered an obvious yes as you are indexing expanded templates, just checking. (You probably want to avoid that exceptions in templates indexing break this feature.)
What we've changed that you probably don't care about:
- Updating search in bulk is much more slow then before. This is the
cost of expanding templates. 2. Search is now backed by a horizontally scalable search backend that is being actively developed (Elasticsearch) so we're in a much better place to expand on the new solution as time goes on.
Neat stuff if you run your own MediaWiki: CirrusSearch is much easier to install than our current search infrastructure.
Yay! :) Maybe people on mediawiki-l would like to test it too.
Nemo
So what will you notice? Nothing! That is because while the new search backend (CirrusSearch) is indexing we've left the current search infrastructure as the default while we work on our list of bugs. You can see the results from CirrusSearch by performing your search as normal and adding "&srbackend=CirrusSearch" to the url parameters.
If you notice any problems with CirrusSearch please file bugs directly for it: https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions&...
Nik Everett
Is there any update on this for the wider world? If successful, is there a timetable for broader implementation?
thanks. Regards billinghurst
On Wed, 28 Aug 2013 14:21:32 -0400, Nikolas Everett neverett@wikimedia.org wrote:
Today we threw the big lever and turned on our new search backend at mediawiki.org. It isn't the default yet but it is just about ready for
you
to try. Here is what is we think we've improved:
- Templates are now expanded during search so:
1a. You can search for text included in templates 1b. You can search for categories included in templates 2. The search engine is updated very quickly after articles change. 3. A few funky things around intitle and incategory: 3a. You can combine them with a regular query (incategory:kings
peaceful)
3b. You can use prefix searches with them (incategory:norma*) 3c. You can use them everywhere in the query (roger incategory:normans)
What we think we've made worse and we're working on fixing:
- Because we're expanding templates some things that probably
shouldn't
be searched are being searched. We've fixed a few of these issues but I wouldn't be surprised if more come up. We opened Bug 53426 regarding
audio
tags. 2. The relative weighting of matches is going to be different. We're still fine tuning this and we'd appreciate any anecdotes describing
search
results that seem out of order. 3. We don't currently index headings beyond the article title in any special way. We'll be fixing that soon. (Bug 53481) 4. Searching for file names or clusters of punctuation characters
doesn't
work as well as it used to. It still works reasonably well if you
surround
your query in quotes but it isn't as good as it was. (Bugs 53013 and 52948) 5. "Did you mean" suggestions currently aren't highlighted at all and sometimes we'll suggest things that aren't actually better. (Bugs 52286
and
- incategory:"category with spaces" isn't working. (Bug 53415)
What we've changed that you probably don't care about:
- Updating search in bulk is much more slow then before. This is the
cost of expanding templates. 2. Search is now backed by a horizontally scalable search backend that
is
being actively developed (Elasticsearch) so we're in a much better place
to
expand on the new solution as time goes on.
Neat stuff if you run your own MediaWiki: CirrusSearch is much easier to install than our current search infrastructure.
So what will you notice? Nothing! That is because while the new search backend (CirrusSearch) is indexing we've left the current search infrastructure as the default while we work on our list of bugs. You
can
see the results from CirrusSearch by performing your search as normal
and
adding "&srbackend=CirrusSearch" to the url parameters.
If you notice any problems with CirrusSearch please file bugs directly
for
it:
https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions&...
Nik Everett
Sorry for not sending an update earlier. This week has been crazy. Anyway, we switched CirrusSearch to the primary search backend on MediaWiki on Wednesday morning San Francisco time. Nothing is on fire yet so the release was successful in that sense but we've filed three new bugs so it certainly wasn't an unmitigated success.
We're probably getting to the point where we can start converting wikis volunteered by ambassadors. We'd add CirrusSearch as a secondary, build the index, and then we and the ambassador will do some testing with the special URL parameter mentioned at the beginning of this thread. When we're all confident that CirrusSearch is an improvement over what is in production now for that wiki we'll switch it over to primary. I'd like to start this process for a few wikis soon. Italian Wikctionary has already been volunteered so we'll add CirrusSearch as a secondary for it soon.
I'll be back to working full steam on bugs next week and many of the currently open bugs are waiting on the next release of Elasticsearch which is supposed to be "real soon" so they should fall into place pretty quickly after we upgrade. You can always check the open bugs here: https://bugzilla.wikimedia.org/buglist.cgi?columnlist=priority%2Cbug_status%...
Nik
On Thu, Sep 12, 2013 at 10:57 PM, billinghurst billinghurst@gmail.com wrote:
Is there any update on this for the wider world? If successful, is there a timetable for broader implementation?
thanks. Regards billinghurst
On Wed, 28 Aug 2013 14:21:32 -0400, Nikolas Everett neverett@wikimedia.org wrote:
Today we threw the big lever and turned on our new search backend at mediawiki.org. It isn't the default yet but it is just about ready for
you
to try. Here is what is we think we've improved:
- Templates are now expanded during search so:
1a. You can search for text included in templates 1b. You can search for categories included in templates 2. The search engine is updated very quickly after articles change. 3. A few funky things around intitle and incategory: 3a. You can combine them with a regular query (incategory:kings
peaceful)
3b. You can use prefix searches with them (incategory:norma*) 3c. You can use them everywhere in the query (roger incategory:normans)
What we think we've made worse and we're working on fixing:
- Because we're expanding templates some things that probably
shouldn't
be searched are being searched. We've fixed a few of these issues but I wouldn't be surprised if more come up. We opened Bug 53426 regarding
audio
tags. 2. The relative weighting of matches is going to be different. We're still fine tuning this and we'd appreciate any anecdotes describing
search
results that seem out of order. 3. We don't currently index headings beyond the article title in any special way. We'll be fixing that soon. (Bug 53481) 4. Searching for file names or clusters of punctuation characters
doesn't
work as well as it used to. It still works reasonably well if you
surround
your query in quotes but it isn't as good as it was. (Bugs 53013 and 52948) 5. "Did you mean" suggestions currently aren't highlighted at all and sometimes we'll suggest things that aren't actually better. (Bugs 52286
and
- incategory:"category with spaces" isn't working. (Bug 53415)
What we've changed that you probably don't care about:
- Updating search in bulk is much more slow then before. This is the
cost of expanding templates. 2. Search is now backed by a horizontally scalable search backend that
is
being actively developed (Elasticsearch) so we're in a much better place
to
expand on the new solution as time goes on.
Neat stuff if you run your own MediaWiki: CirrusSearch is much easier to install than our current search infrastructure.
So what will you notice? Nothing! That is because while the new search backend (CirrusSearch) is indexing we've left the current search infrastructure as the default while we work on our list of bugs. You
can
see the results from CirrusSearch by performing your search as normal
and
adding "&srbackend=CirrusSearch" to the url parameters.
If you notice any problems with CirrusSearch please file bugs directly
for
it:
https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions&...
Nik Everett
Wikitech-ambassadors mailing list Wikitech-ambassadors@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors
Thanks for the update, and zero need to be sorry. Appreciate that report and the grand work in the new indexing and searching functionality.
Without asking the community that I somewhat represent here, I believe that English Wikisource would be a good place to test the Wikisources, as one of the wikis that is upper-medium in size. The Wikisources are looking forward to testing the ability to have transcluded x-ns pages indexed (Page: -> main) which is a particular advantage with the CirrusSearch (which I successfully tested at test2wiki).
Regards, Billinghurst
On Fri, 13 Sep 2013 02:56:15 -0400, Nikolas Everett neverett@wikimedia.org wrote:
Sorry for not sending an update earlier. This week has been crazy. Anyway, we switched CirrusSearch to the primary search backend on MediaWiki on Wednesday morning San Francisco time. Nothing is on fire yet so the release was successful in that sense but we've filed three new bugs so it certainly wasn't an unmitigated success.
We're probably getting to the point where we can start converting wikis volunteered by ambassadors. We'd add CirrusSearch as a secondary, build the index, and then we and the ambassador will do some testing with the special URL parameter mentioned at the beginning of this thread. When we're all confident that CirrusSearch is an improvement over what is in production now for that wiki we'll switch it over to primary. I'd like to start this process for a few wikis soon. Italian Wikctionary has already been volunteered so we'll add CirrusSearch as a secondary for it soon.
I'll be back to working full steam on bugs next week and many of the currently open bugs are waiting on the next release of Elasticsearch which is supposed to be "real soon" so they should fall into place pretty quickly after we upgrade. You can always check the open bugs here:
https://bugzilla.wikimedia.org/buglist.cgi?columnlist=priority%2Cbug_status%...
Nik
On Thu, Sep 12, 2013 at 10:57 PM, billinghurst billinghurst@gmail.com wrote:
Is there any update on this for the wider world? If successful, is
there
a timetable for broader implementation?
thanks. Regards billinghurst
wikitech-ambassadors@lists.wikimedia.org