I was going to respond to this a while ago but couldn't really do it
justice. I'm still pretty sure my explanation won't be great, which is an
indication of just how good Google is.
For strait search there is nothing we can do that Google can't. It might
cost them more time and money to make searching mediawiki awesome but they
lots of both so we're just not going to beat them there. There are a few
things that we can do more easily/cheaply than Google:
1. We can update our search index right when changes are made including
when changes are made to transcluded pages.
2. We can search based on redirects to a page.
3. We can filter (and maybe one day facet) based on categories.
4. We could search based on citations.
We will, on the other hand, be better about listening to what the community
needs with regards to search. Part of the problem here is that
historically we've let search languish and my first foray into making
search nicer isn't going to provide much new stuff for the community.
Instead its a solid platform on which to build things that the community
needs and which should make search less exciting for operations engineers.
That really isn't exciting for the community to hear and for that I am
sorry. I can only promise that we'll do more later.
There are some more deep integrations into mediawiki that I don't see
google doing but we could work on in the future:
1. We could create a section that allowed users to easily find "similar"
pages. I'm a little fuzzy on exactly how we'd calculate similarity.
2. We could automatically dig around in commons for useful media for an
article. We could use this to automatically provide extra media which
might be relevant or as a curation aid. On second thought the second one
sounds much better.
Actually, some kind of game around tagging media as relevant to an article
might be quite a decent way to encourage engagement. By game I mean
something like Galaxy Zoo or LinkedIn's endorsements. You could do this
without a nice search but it'd help produce much more relevant results.
And then there is the cynic in me that says that it is worth doing just so
we aren't reliant on external (corporate) entities. I'm really not sure
how I would feel if the only way to find stuff on WMF's wikis was with
Finally we have the private wikis like you mentioned - they mostly can't
use google. We are trying to make sure CirrusSearch works for them. The
idea there is to provide something that is better at finding results than
the database based search because it uses the same analysis that we've
optimized for WMF. Elasticsearch isn't some kind of precision tuned
machine - you can actually get quite decent behaviour out of downloading
the deb or rpm and installing it. You only really need one instance.
So now that I've created this wall of text I don't feel that I've really
answered your question well, but I've answered it. That is the thing about
hard questions: they are harder to answer than to ask.
I'd really love more brainstorming. Cross wiki search was another good
idea someone added to the page a while ago.
On Fri, Jul 19, 2013 at 2:24 PM, C. Scott Ananian <cananian(a)wikimedia.org>wrote;wrote:
I wonder if there are queries or use cases we can
support that *aren't*
already better handled by google. Granted, users of private wikis can't
simply use the 'site:' trick to reuse Google search results -- but users of
private wikis also probably don't need superduper scalability.
Trying to brainstorm here, not start a flame war. What sorts of useful
searches could we excel at? (Maybe these are searches/use cases that will
facilitate editor engagement?)
Wikitech-l mailing list