Tim and I upgraded the lucene search engine during the weekend. It's currently up for en.wiki, rest is to come in next few days.
I already highlighted some of the features, but here's an update list:
* Improved scoring - the score of the document is now a function of how many other documents link to it. Same-namespace redirects now don't show up in the search results, their names are indexed alongside with the article they point to. E.g. searching USA will give you United States as first hit. Further, links from the beginning of the article are weighted more (as they are assumed to give a short keyword-like description of the article)
* Prefix searches - searcher now understands namespaces, i.e. if you enter help:images, it will search the Help namespace. You can also use the 'all' prefix that will search everything. Prefixes are customizable, i.e. you can make your custom prefixes (see below). I hope this will bring some ease in searching for help and searching the project/wikipedia namespace.
* Accentless search - accents are striped, this includes Hebrew pointing and similar, also adds common transliterations (eg ü -> ue)
* Numbers/stemming - numbers are now included in the index, and the stemming issues resolved...
If you want the install/customize the new search extension, take a look at: http://www.mediawiki.org/Extensions:LuceneSearch
Robert
On 7/1/07, Robert Stojnic rainmansr@gmail.com wrote:
Tim and I upgraded the lucene search engine during the weekend. It's currently up for en.wiki, rest is to come in next few days.
I already highlighted some of the features, but here's an update list:
<snip>
Awesome work guys! I can't wait to try it out! Surely one of the most oft-requested improvements to MediaWiki is to its search, and this looks like a huge leap in the right direction.
Thanks guys, looks like this would e much more accurate. However, the link to mediawiki.org is wrong, there is no such page Extensions:LuceneSearch. :(
On 7/2/07, Daniel Cannon cannon.danielc@gmail.com wrote:
On 7/1/07, Robert Stojnic rainmansr@gmail.com wrote:
Tim and I upgraded the lucene search engine during the weekend. It's currently up for en.wiki, rest is to come in next few days.
I already highlighted some of the features, but here's an update list:
<snip>
Awesome work guys! I can't wait to try it out! Surely one of the most oft-requested improvements to MediaWiki is to its search, and this looks like a huge leap in the right direction.
-- Daniel Cannon (AmiDaniel)
http://amidaniel.com cannon.danielc@gmail.com _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 01/07/07, Robert Stojnic rainmansr@gmail.com wrote:
Tim and I upgraded the lucene search engine during the weekend. It's currently up for en.wiki, rest is to come in next few days.
w00t! Any plans to include this in the MediaWiki distribution tarball by default?
- d.
I was going to ask that, but I suppose the fact that is like an interface to something else - Lucene - would mean that too would have to be included, or is that just wrong? Don't know anything about it technically to be honest but it sounds interesting and certainly better than MediaWiki's default search ;-).
On 03/07/07, David Gerard dgerard@gmail.com wrote:
On 01/07/07, Robert Stojnic rainmansr@gmail.com wrote:
Tim and I upgraded the lucene search engine during the weekend. It's currently up for en.wiki, rest is to come in next few days.
w00t! Any plans to include this in the MediaWiki distribution tarball by default?
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 7/3/07, Gary Kirk gary.kirk@gmail.com wrote:
I was going to ask that, but I suppose the fact that is like an interface to something else - Lucene - would mean that too would have to be included, or is that just wrong? Don't know anything about it technically to be honest but it sounds interesting and certainly better than MediaWiki's default search ;-).
On 03/07/07, David Gerard dgerard@gmail.com wrote:
On 01/07/07, Robert Stojnic rainmansr@gmail.com wrote:
Tim and I upgraded the lucene search engine during the weekend. It's currently up for en.wiki, rest is to come in next few days.
w00t! Any plans to include this in the MediaWiki distribution tarball by default?
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Gary Kirk
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Lucene cannot be included as it is a program, and requires either precomputed binaries (every operating system needs different ones) or the source code and a compiler. There is no guarantee that the computer MediaWiki is being installed on has a compiler in the required language, there are many variations of compilers and it would only be able to be installed via the web-based installer if safe_mode is disabled. I doubt this will ever be included in the MediaWiki base - an extension is best. Thanks, MinuteElectron.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Minute Electron wrote:
Lucene cannot be included as it is a program, and requires either precomputed binaries (every operating system needs different ones) or the source code and a compiler.
Well, it's Java at least, so only one binary needed on most systems. :)
Still, that's an external dependency, and running a Java daemon is not an out-of-the-box task on your standard LAMP server.
I would like to see some improvements to the built-in search, though; interface improvements and some better category tagging would help a lot.
- -- brion vibber (brion @ wikimedia.org)
Brion schrieb:
Still, that's an external dependency, and running a Java daemon is not an out-of-the-box task on your standard LAMP server.
Agreed. Has anyone played with using ferret instead of lucene? Many third-party hosts support Ruby now, whereas Java support is minute by comparison.
-- Jim R. Wilson (jimbojw)
On 7/3/07, Brion Vibber brion@wikimedia.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Minute Electron wrote:
Lucene cannot be included as it is a program, and requires either precomputed binaries (every operating system needs different ones) or
the
source code and a compiler.
Well, it's Java at least, so only one binary needed on most systems. :)
Still, that's an external dependency, and running a Java daemon is not an out-of-the-box task on your standard LAMP server.
I would like to see some improvements to the built-in search, though; interface improvements and some better category tagging would help a lot.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGim6LwRnhpk1wk44RArQtAJ4lckFQWeN/6v4KMtpREFeDs41fEQCgsb37 9GbFRq5GWV6bu1vviE0G8xE= =LfsI -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Jim Wilson <wilson.jim.r@...> writes:
Brion schrieb:
Still, that's an external dependency, and running a Java daemon is not an out-of-the-box task on your standard LAMP server.
Agreed. Has anyone played with using ferret instead of lucene? Many third-party hosts support Ruby now, whereas Java support is minute by comparison.
Ferret currently does not support compact index file, which is the default format of Lucene index. This situation makes a small problem to reuse existed index files. Obviously we can re-index for Ferret, or just wait for Ferret being ready -- it's on the top of Ferret's to-do list.
Besides, Solr may be also deserved to mention since it focuses on reliability that in some sense proved by technorati.
Regards, /Mike/
On 03/07/07, Brion Vibber brion@wikimedia.org wrote:
Minute Electron wrote:
Lucene cannot be included as it is a program, and requires either precomputed binaries (every operating system needs different ones) or the source code and a compiler.
Well, it's Java at least, so only one binary needed on most systems. :)
O rly? I was remembering the ancient Mono port, I think. Never mind me.
- d.
wikitech-l@lists.wikimedia.org