Is that the same as mediawiki, i.e. MySQL search?
Any plan to use lucene (Zend Framework) in the future?
Howa
I think Wikimedia projects do use Lucene. We don't use MediaWiki's default search for sure.
On 9/10/07, howard chen howachen@gmail.com wrote:
Is that the same as mediawiki, i.e. MySQL search?
Any plan to use lucene (Zend Framework) in the future?
Howa
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 9/10/07, howard chen howachen@gmail.com wrote:
Is that the same as mediawiki, i.e. MySQL search?
Any plan to use lucene (Zend Framework) in the future?
We do use Lucene currently.
I have a new search system running externally right now which is much more powerful, for example it is able to apply category intersections, fuzzy title matches, text regular expressions, page link constraints and geographic filtering. Most of the functionality is fast enough to make available to everyone.
If you'd like to be a beta-tester please let me know.
On 9/10/07, Gregory Maxwell gmaxwell@gmail.com wrote:
I have a new search system running externally right now which is much more powerful, for example it is able to apply category intersections, fuzzy title matches, text regular expressions, page link constraints and geographic filtering. Most of the functionality is fast enough to make available to everyone.
Well, I think the latest versions of Lucene use fuzzy title matches too, but I don't know about the rest. (How fast are typical regexes on this much data?)
If you'd like to be a beta-tester please let me know.
sure!
Is that based on wikipedia's owned Lucene or Java based or from the ZFW (PHP) ?
On 9/10/07, Gregory Maxwell gmaxwell@gmail.com wrote:
On 9/10/07, howard chen howachen@gmail.com wrote:
Is that the same as mediawiki, i.e. MySQL search?
Any plan to use lucene (Zend Framework) in the future?
We do use Lucene currently.
I have a new search system running externally right now which is much more powerful, for example it is able to apply category intersections, fuzzy title matches, text regular expressions, page link constraints and geographic filtering. Most of the functionality is fast enough to make available to everyone.
If you'd like to be a beta-tester please let me know.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
howard chen wrote:
Is that the same as mediawiki, i.e. MySQL search?
Any plan to use lucene (Zend Framework) in the future?
We use Lucene Java. No there are no plans to use the Lucene in Zend Framework, I didn't know it existed until now. Zend doesn't sound to me like the best framework for building multithreaded apps.
-- Tim Starling
Hello,
One of my interest is how Wikipedia's Lucene implmentation can be used to tackle foreign languages such as Chinese or Japanese, where tokenization is more complex.
Besides, will the search engine be open sourced later?
:)
On 9/11/07, Tim Starling tstarling@wikimedia.org wrote:
howard chen wrote:
Is that the same as mediawiki, i.e. MySQL search?
Any plan to use lucene (Zend Framework) in the future?
We use Lucene Java. No there are no plans to use the Lucene in Zend Framework, I didn't know it existed until now. Zend doesn't sound to me like the best framework for building multithreaded apps.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
howard chen wrote:
Hello,
One of my interest is how Wikipedia's Lucene implmentation can be used to tackle foreign languages such as Chinese or Japanese, where tokenization is more complex.
Besides, will the search engine be open sourced later?
It's always been open source.
http://svn.wikimedia.org/svnroot/mediawiki/trunk/lucene-search-2/
and further refactoring and work is going on on branch:
http://svn.wikimedia.org/svnroot/mediawiki/branches/lucene-search-2.1/
-- brion vibber (brion @ wikimedia.org)
For CJK we are currently using a very simple tokenizer, the code is here [1]. Apparently, this is the standard way of handling CJK and a similar tokenizer is included in the lucene sandbox. If you could provide some insight into a better CJK tokenizer, we would be glad to listen :)
r.
[1] http://svn.wikimedia.org/viewvc/mediawiki/trunk/lucene-search-2/src/org/wiki...
On 9/11/07, howard chen howachen@gmail.com wrote:
Hello,
One of my interest is how Wikipedia's Lucene implmentation can be used to tackle foreign languages such as Chinese or Japanese, where tokenization is more complex.
Besides, will the search engine be open sourced later?
:)
On 9/11/07, Tim Starling tstarling@wikimedia.org wrote:
howard chen wrote:
Is that the same as mediawiki, i.e. MySQL search?
Any plan to use lucene (Zend Framework) in the future?
We use Lucene Java. No there are no plans to use the Lucene in Zend Framework, I didn't know it existed until now. Zend doesn't sound to me like the best framework for building multithreaded apps.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
I believe you at least need to maintain a word dictionary for each language before come up with a better tokenization / segmentation scheme. This is the starting point.
Btw, do wikipedia maintain such as word list / dictionary? ( I mean a complete one rather then wiktionary)
On 9/11/07, Robert Stojnic rainmansr@gmail.com wrote:
For CJK we are currently using a very simple tokenizer, the code is here [1]. Apparently, this is the standard way of handling CJK and a similar tokenizer is included in the lucene sandbox. If you could provide some insight into a better CJK tokenizer, we would be glad to listen :)
r.
[1] http://svn.wikimedia.org/viewvc/mediawiki/trunk/lucene-search-2/src/org/wiki...
On 9/11/07, howard chen howachen@gmail.com wrote:
Hello,
One of my interest is how Wikipedia's Lucene implmentation can be used to tackle foreign languages such as Chinese or Japanese, where tokenization is more complex.
Besides, will the search engine be open sourced later?
:)
On 9/11/07, Tim Starling tstarling@wikimedia.org wrote:
howard chen wrote:
Is that the same as mediawiki, i.e. MySQL search?
Any plan to use lucene (Zend Framework) in the future?
We use Lucene Java. No there are no plans to use the Lucene in Zend Framework, I didn't know it existed until now. Zend doesn't sound to me like the best framework for building multithreaded apps.
-- Tim Starling
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org