Re: [Wikitech-l] lucene search 2.0 test webinterface

23 May 2007

      Hi all,
Tian-Jian "Barabbas" Jiang said:
...
I suggest you test this by MRR (Mean Reciprocal Rate):
s/Rate/Rank/g
Sorry about the typo. You may also want to check
MAPs (Mean Average Precisions)
Although I bet you have already done it, here's my
2 cents:
I usually adapt a concept to my IR system:
Precision first, Recall next.
For example, my system may do exact match first, get
the results from
searcher.doc(topDocs.scoreDocs[i].doc)
and save them externally.
It allows me to merge some more partial matched
results later.
Apparently these can be done by something like parallel
queries, but I like to merge them sequentially by myself.
...
For queries contain common phrases, you may want to
manipulate a more complicated RR based on similarity
to the title, or just annotate an answer set by hand.
Otherwise, the search engine is fast and the results are overall
> promising. Are you considering adding snippets of the search
results?
>

Highlighting is a very cpu and memory consuming thingy. You need
to fetch
all articles in search results (i.e. 20 per page), retokenize
them, fragment
them in snippets, and score each snippet so you can show the best. I'm
currently working on an distributed implementation for this, but
it might
still put too heavy load on the cluster.

Apache Solr may be an alternative solution for this.
BTW, I'm pretty interesting in lucene-related tasks. If it's
OK to you, I would like to help. :)
Sincerely,

/Mike "b6s" Jiang/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] lucene search 2.0 test webinterface