[Foundation-l] LA Times article / Advertising in Wikipedia

Brian Brian.Mingus at colorado.edu
Thu Mar 13 08:59:11 UTC 2008


When I say that the WMF can't afford to show snippets, I am saying that
there are not enough servers to perform the job because the ones we've got
are busy and we can't waste any time on "the additional strain from loading
extra page text," so I'm not sure that I am off base. This seems to have
been true for almost every visitor to the website (what % is logged in?) for
the last two years.

There is the potential to build a great search engine for Wikipedia, not
just one that barely works. [[Powerset (company)]] has demonstrated their
natural language search technology by doing a full parse of the
 encyclopedia. You can ask their search engine questions in English, and it
will answer them in English because it has read Wikipedia, and to a small
degree understands it. If the possibility was open, it's very feasible that
such a research system could be deployed live by the WMF. These systems are
already being created by academics, and then lost and forgotten in the
bowels of aging subversion repositories.

Instead of being such a great search engine, Wikipedia's search engine is
essentially a hobby project that is worked on by 1-2 people. That's by
design. The WMF can't afford to deploy something more substantial. Am I
still off base, or is the door to really improving Wikipedia's search engine
to something that is state of the art actually open? Some back of the
envelope math suggests that the last several billion visitors to Wikipedia
weren't even shown snippets. Since we're not even doing that, we may as well
give it over to Google for their improved usability and the profitability of
their adverts (the original point).

On Wed, Mar 12, 2008 at 3:41 PM, Brion Vibber <brion at wikimedia.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Brian wrote:
> [snip]
> | I may be off base here, but I am under the impression that we don't
> | implement basic usability improvements in our search engine, such as
> showing
> | snippets and researched back link analysis (as simple as PageRank),
> because
> | we can't afford it.
>
> You're off base. :)
>
> The Lucene-based backend has been undergoing a lot of improvements over
> the last year, including making progressive updating actually function
> again, and lots of ongoing work on improving ranking with backlinks,
> similarity searches, etc. Robert Stojnic is doing this work largely
> "invisibly" -- it doesn't change the look and feel of the search engine;
> he's just made it work progressively better as the months go by.
>
>
> The search UI front-end hasn't received a lot of attention in a couple
> of years simply because there are lots of other things that have
> required developer time, and nobody's had the interest to hop in and
> take it over the way Robert took over the backend.
>
> Since it is overdue, I've recently started making some improvements to
> the front-end, folding things back into MediaWiki's core search UI
> (which will benefit both us and third-party MediaWiki users) and adding
> niceties such as thumbnails to image results; please hop over to
> wikitech-l if interested in helping out.
>
>
> As for snippets -- we have included snippets with search results for
> many, many years. At the moment though, on most of our sites, snippets
> are disabled if you're not logged in.
>
> This was a temporary hack put in because of a performance issue a couple
> years ago when we were much more strapped; in the course of other UI
> work currently ongoing, that restriction is very likely to be lifted,
> and any remaining performance problems worked out. (These weren't
> problems with the search engine, but simply the additional strain from
> loading extra page text in order to do the snippet extraction. It's very
> likely that this problem is simply obsolete due to intermediate growth
> and internal caching improvements.)
>
> - -- brion vibber (brion @ wikimedia.org)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.8 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAkfYTfwACgkQwRnhpk1wk463JACcCeJp3gBXkbL3Gon4E4Douxtz
> Un0AmgL0U9VUjhEE9pslndi8pOTMZmH6
> =nNil
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list