Scott,
I was going to respond to this a while ago but couldn't really do it justice. I'm still pretty sure my explanation won't be great, which is an indication of just how good Google is.
For strait search there is nothing we can do that Google can't. It might cost them more time and money to make searching mediawiki awesome but they lots of both so we're just not going to beat them there. There are a few things that we can do more easily/cheaply than Google: 1. We can update our search index right when changes are made including when changes are made to transcluded pages. 2. We can search based on redirects to a page. 3. We can filter (and maybe one day facet) based on categories. 4. We could search based on citations.
We will, on the other hand, be better about listening to what the community needs with regards to search. Part of the problem here is that historically we've let search languish and my first foray into making search nicer isn't going to provide much new stuff for the community. Instead its a solid platform on which to build things that the community needs and which should make search less exciting for operations engineers. That really isn't exciting for the community to hear and for that I am sorry. I can only promise that we'll do more later.
There are some more deep integrations into mediawiki that I don't see google doing but we could work on in the future: 1. We could create a section that allowed users to easily find "similar" pages. I'm a little fuzzy on exactly how we'd calculate similarity. 2. We could automatically dig around in commons for useful media for an article. We could use this to automatically provide extra media which might be relevant or as a curation aid. On second thought the second one sounds much better.
Actually, some kind of game around tagging media as relevant to an article might be quite a decent way to encourage engagement. By game I mean something like Galaxy Zoo or LinkedIn's endorsements. You could do this without a nice search but it'd help produce much more relevant results.
And then there is the cynic in me that says that it is worth doing just so we aren't reliant on external (corporate) entities. I'm really not sure how I would feel if the only way to find stuff on WMF's wikis was with Google/Bing/Yahoo....
Finally we have the private wikis like you mentioned - they mostly can't use google. We are trying to make sure CirrusSearch works for them. The idea there is to provide something that is better at finding results than the database based search because it uses the same analysis that we've optimized for WMF. Elasticsearch isn't some kind of precision tuned machine - you can actually get quite decent behaviour out of downloading the deb or rpm and installing it. You only really need one instance.
So now that I've created this wall of text I don't feel that I've really answered your question well, but I've answered it. That is the thing about hard questions: they are harder to answer than to ask.
I'd really love more brainstorming. Cross wiki search was another good idea someone added to the page a while ago.
Nik
On Fri, Jul 19, 2013 at 2:24 PM, C. Scott Ananian cananian@wikimedia.orgwrote:
I wonder if there are queries or use cases we can support that *aren't* already better handled by google. Granted, users of private wikis can't simply use the 'site:' trick to reuse Google search results -- but users of private wikis also probably don't need superduper scalability.
Trying to brainstorm here, not start a flame war. What sorts of useful searches could we excel at? (Maybe these are searches/use cases that will facilitate editor engagement?) --scott
-- (http://cscott.net) _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l