Re: [Wikitech-l] Full text search

29 Mar 2005

On Tue, 29 Mar 2005 20:16:03 +0100, Minty &lt;mintywalker(a)gmail.com&gt; wrote:
...
  anyone playing with http://nutch.org/
 ? 
Actually, I think the more generic Lucene library which Nutch is built
upon will be more useful. We should be indexing the wikitext, not the
HTML (which is a lower quality version ;))

Seriously, we also don't want a crawler. What is left in Nutch's favour?

However, I don't imagine either will be used by Wikimedia, as they are
written in Java, which is  slow and takes up too much memory compared
to natively running stuff (i.e. C or C++). It's already bad enough
that we're using PHP! (In one extreme case, a diff took 45.5 seconds
in PHP while the same algorithm took 0.5 seconds in C (or maybe C++)
(this is from a developer)).

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Full text search