[Wikitech-l] Commons search. Was: Commons tech wishlist

11 Aug 2007

On 8/11/07, Brion Vibber &lt;brion(a)wikimedia.org&gt; wrote:
...
  Worth taking a look at, though I wonder why the people
working on
 Mayflower aren't:

 1) Active in the development list or IRC channel

 2) Committing their code to SVN 

So how do you suggest a search for commons be integrated when it can't
work off the current production database alone?

Mayflower uses a periodic text extract from commons which is passed
through a stemmer, then the most frequent terms get stop-worded to
prevent index bloat.  Incremental updates of the full text part of
MayFlower aren't possible as currently designed.

Since the start of Mayflower Tangotango and I have discussed moving
its backend to the search stuff I've been working on. The backend
stuff I'm using is far faster, doesn't fall over with overly frequent
keys, and handles incremental update just fine. I've finally gotten
around to putting up a web front end for my search stuff, check it
out, Commons version is at
http://tools.wikimedia.de/~gmaxwell/cgi-bin/cattersect.py, enwiki
version is at http://tools.wikimedia.de/~gmaxwell/cgi-bin/enwiki_cattersect.py

The problem there is that my backend stuff depends on PostgreSQL,
because postgresql provides the inverted indexing which is utterly
required for the   qualities of my implementation.  So on that case,
again we're in a situation where it's not as simple as "commit it so
SVN" since using that would require a non-trivial addition of software
infrastructure which would have to be carefully considered.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Commons search. Was: Commons tech wishlist