Hi all,
I started this discussion. I don't know where it is heading for now.. :) As a layman, I wonder why such a robust software like mediawiki doesn't provide some nice search features! It boasts of many other features.....
Even any simple php-MySQL application provide some kind of advanced search features (without Lucene or similar stuff) like search in title or full-text, search for all the words (AND) or any of the words (OR) or exact phrase, etc.
I don't know what prevents developers from providing these features.. Could someone explain please?
Regards,
Jack ----------------------------------------------------------------
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Michael Daly Sent: Tuesday, October 16, 2007 12:30 AM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Making search "and" by default
Domas Mituzas wrote:
They were experimenting with that lately. Thats how searching for 'domas' ended up with DOMA on top ;)
I recently searched on "Johann..." and google kept hitting "John..." and other language equivalents. Google's searching can be annoyingly non-specific at times. In any "advanced search" I'd like to tell it to find/not find plurals or singulars, other languages etc. Google even modifies the search parameters for quoted (i.e. exact text) terms.
Mike
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
This electronic mail (including any attachment thereto) may be confidential and privileged and is intended only for the individual or entity named above. Any unauthorized use, printing, copying, disclosure or dissemination of this communication may be subject to legal restriction or sanction. Accordingly, if you are not the intended recipient, please notify the sender by replying to this email immediately and delete this email (and any attachment thereto) from your computer system...Thank You
Hi, Jack,
Unfortunately, search is not an easy thing to do it right. MySQL has some support for full text search, but it is not perfect yet. I guess it just needs more time for MySQL to be mature on search feature.
Cheers,
JIan
On 10/15/07, Jack Eapen C jackec@suntecgroup.com wrote:
Hi all,
I started this discussion. I don't know where it is heading for now.. :) As a layman, I wonder why such a robust software like mediawiki doesn't provide some nice search features! It boasts of many other features.....
Even any simple php-MySQL application provide some kind of advanced search features (without Lucene or similar stuff) like search in title or full-text, search for all the words (AND) or any of the words (OR) or exact phrase, etc.
I don't know what prevents developers from providing these features.. Could someone explain please?
Regards,
Jack
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Michael Daly Sent: Tuesday, October 16, 2007 12:30 AM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Making search "and" by default
Domas Mituzas wrote:
They were experimenting with that lately. Thats how searching for 'domas' ended up with DOMA on top ;)
I recently searched on "Johann..." and google kept hitting "John..." and other language equivalents. Google's searching can be annoyingly non-specific at times. In any "advanced search" I'd like to tell it to find/not find plurals or singulars, other languages etc. Google even modifies the search parameters for quoted (i.e. exact text) terms.
Mike
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
This electronic mail (including any attachment thereto) may be confidential and privileged and is intended only for the individual or entity named above. Any unauthorized use, printing, copying, disclosure or dissemination of this communication may be subject to legal restriction or sanction. Accordingly, if you are not the intended recipient, please notify the sender by replying to this email immediately and delete this email (and any attachment thereto) from your computer system...Thank You
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Hi, All,
As promised, I put together my package for mediawiki search for you to download and try out.
http://www.hongandjian.com/hongandjian/index.php/Mediawiki_Search_Download
This is a start of a crawler based search engine for mediawiki content. Please do try it out if you like. Let me know of any problem.
Thanks,
Jian
On 10/15/07, jian chen chenjian1227@gmail.com wrote:
Hi, Jack,
Unfortunately, search is not an easy thing to do it right. MySQL has some support for full text search, but it is not perfect yet. I guess it just needs more time for MySQL to be mature on search feature.
Cheers,
JIan
On 10/15/07, Jack Eapen C jackec@suntecgroup.com wrote:
Hi all,
I started this discussion. I don't know where it is heading for now.. :) As a layman, I wonder why such a robust software like mediawiki doesn't provide some nice search features! It boasts of many other features.....
Even any simple php-MySQL application provide some kind of advanced search features (without Lucene or similar stuff) like search in title or full-text, search for all the words (AND) or any of the words (OR) or
exact phrase, etc.
I don't know what prevents developers from providing these features.. Could someone explain please?
Regards,
Jack
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org ] On Behalf Of Michael Daly Sent: Tuesday, October 16, 2007 12:30 AM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Making search "and" by default
Domas Mituzas wrote:
They were experimenting with that lately. Thats how searching for 'domas' ended up with DOMA on top ;)
I recently searched on "Johann..." and google kept hitting "John..." and
other language equivalents. Google's searching can be annoyingly non-specific at times. In any "advanced search" I'd like to tell it to find/not find plurals or singulars, other languages etc. Google even modifies the search parameters for quoted (i.e. exact text) terms.
Mike
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
This electronic mail (including any attachment thereto) may be confidential and privileged and is intended only for the individual or entity named above. Any unauthorized use, printing, copying, disclosure or dissemination of this communication may be subject to legal restriction or sanction. Accordingly, if you are not the intended recipient, please notify the sender by replying to this email immediately and delete this email (and any attachment thereto) from your computer system...Thank You
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
So MySql has to mature in order to get an OR selection?
DSig David Tod Sigafoos | SANMAR Corporation PICK Guy
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of jian chen Sent: Tuesday, October 16, 2007 0:13 To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Making search "and" by default
Hi, Jack,
Unfortunately, search is not an easy thing to do it right. MySQL has some support for full text search, but it is not perfect yet. I guess it just needs more time for MySQL to be mature on search feature.
Cheers,
JIan
On 10/15/07, Jack Eapen C jackec@suntecgroup.com wrote:
Hi all,
I started this discussion. I don't know where it is heading for now..
:)
As a layman, I wonder why such a robust software like mediawiki
doesn't
provide some nice search features! It boasts of many other features.....
Even any simple php-MySQL application provide some kind of advanced search features (without Lucene or similar stuff) like search in title or full-text, search for all the words (AND) or any of the words (OR)
or
exact phrase, etc.
I don't know what prevents developers from providing these features.. Could someone explain please?
Regards,
Jack
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Michael Daly Sent: Tuesday, October 16, 2007 12:30 AM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Making search "and" by default
Domas Mituzas wrote:
They were experimenting with that lately. Thats how searching for 'domas' ended up with DOMA on top ;)
I recently searched on "Johann..." and google kept hitting "John..."
and
other language equivalents. Google's searching can be annoyingly non-specific at times. In any "advanced search" I'd like to tell it
to
find/not find plurals or singulars, other languages etc. Google even modifies the search parameters for quoted (i.e. exact text) terms.
Mike
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
This electronic mail (including any attachment thereto) may be confidential and privileged and is intended only for the individual or entity named above. Any unauthorized use, printing, copying,
disclosure or
dissemination of this communication may be subject to legal
restriction or
sanction. Accordingly, if you are not the intended recipient, please
notify
the sender by replying to this email immediately and delete this email
(and
any attachment thereto) from your computer system...Thank You
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MySQL is just a relational database. Not really designed to be a search engine. It will take time for them or any other database vendors to catch up with the search front.
In the meantime, I can see people feel installing a search engine is a lot of work. So, I guess hosted service could be another option. I am looking into providing hosted site search nowadays.
Cheers,
Jian
On 10/16/07, Dave Sigafoos davesigafoos@sanmar.com wrote:
So MySql has to mature in order to get an OR selection?
DSig David Tod Sigafoos | SANMAR Corporation PICK Guy
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of jian chen Sent: Tuesday, October 16, 2007 0:13 To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Making search "and" by default
Hi, Jack,
Unfortunately, search is not an easy thing to do it right. MySQL has some support for full text search, but it is not perfect yet. I guess it just needs more time for MySQL to be mature on search feature.
Cheers,
JIan
On 10/15/07, Jack Eapen C jackec@suntecgroup.com wrote:
Hi all,
I started this discussion. I don't know where it is heading for now..
:)
As a layman, I wonder why such a robust software like mediawiki
doesn't
provide some nice search features! It boasts of many other features.....
Even any simple php-MySQL application provide some kind of advanced search features (without Lucene or similar stuff) like search in title or full-text, search for all the words (AND) or any of the words (OR)
or
exact phrase, etc.
I don't know what prevents developers from providing these features.. Could someone explain please?
Regards,
Jack
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Michael Daly Sent: Tuesday, October 16, 2007 12:30 AM To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Making search "and" by default
Domas Mituzas wrote:
They were experimenting with that lately. Thats how searching for 'domas' ended up with DOMA on top ;)
I recently searched on "Johann..." and google kept hitting "John..."
and
other language equivalents. Google's searching can be annoyingly non-specific at times. In any "advanced search" I'd like to tell it
to
find/not find plurals or singulars, other languages etc. Google even modifies the search parameters for quoted (i.e. exact text) terms.
Mike
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
This electronic mail (including any attachment thereto) may be confidential and privileged and is intended only for the individual or entity named above. Any unauthorized use, printing, copying,
disclosure or
dissemination of this communication may be subject to legal
restriction or
sanction. Accordingly, if you are not the intended recipient, please
notify
the sender by replying to this email immediately and delete this email
(and
any attachment thereto) from your computer system...Thank You
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Dave Sigafoos wrote:
So MySql has to mature in order to get an OR selection?
Nope... MySQL's boolean fulltext search uses an OR logical operation by default, however this behavior is contrary to typical web search practice and gives uselessly overbroad results in most cases.
People normally expect that adding terms to their search query will *narrow* the result set (logical AND), whereas a logical OR *broadens* the result set. When common words are involved, you quickly get to the point where you get an insane number of results, very very few of which are relevant.
MediaWiki's MySQL search module thus produces its queries using logical AND in order to produce useful results in accordance with expectations. (In MySQL's boolean fulltext syntax, this prepends each term with a '+'.)
If you wanted you could switch the default to OR on your wiki by hacking a line in SearchMySQL4.php.
change: var $strictMatching = true;
to: var $strictMatching = false;
IMHO this would produce overbroad results by default; to narrow down your searches you would have to manually prepend a + to the search terms you wish to require, which would make it a lot harder to do typical searches.
Alternatively you could write up some kind of query parser that would take some desired alternate syntax (say, with the word 'OR' as an infix operator) to produce the proper MySQL boolean fulltext query.
For more info on the boolean fulltext seach, see: http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html
-- brion vibber (brion @ wikimedia.org)
Oh you get me wrong .. I get why AND is the default and would simply like a way to *force* an OR.
I was was simply replying to someone who said ".. MySQL has some support for full text search, but it is not perfect yet. I guess it just needs more time for MySQL to be mature on search feature .."
As we know mySql is capable of AND and OR even on full text.
DSig David Tod Sigafoos | SANMAR Corporation PICK Guy 206-770-5585 davesigafoos@sanmar.com
-----Original Message----- From: mediawiki-l-bounces@lists.wikimedia.org [mailto:mediawiki-l-bounces@lists.wikimedia.org] On Behalf Of Brion Vibber Sent: Tuesday, October 16, 2007 8:39 To: MediaWiki announcements and site admin list Subject: Re: [Mediawiki-l] Making search "and" by default
Dave Sigafoos wrote:
So MySql has to mature in order to get an OR selection?
Nope... MySQL's boolean fulltext search uses an OR logical operation by default, however this behavior is contrary to typical web search practice and gives uselessly overbroad results in most cases.
People normally expect that adding terms to their search query will *narrow* the result set (logical AND), whereas a logical OR *broadens* the result set. When common words are involved, you quickly get to the point where you get an insane number of results, very very few of which are relevant.
MediaWiki's MySQL search module thus produces its queries using logical AND in order to produce useful results in accordance with expectations. (In MySQL's boolean fulltext syntax, this prepends each term with a '+'.)
If you wanted you could switch the default to OR on your wiki by hacking a line in SearchMySQL4.php.
change: var $strictMatching = true;
to: var $strictMatching = false;
IMHO this would produce overbroad results by default; to narrow down your searches you would have to manually prepend a + to the search terms you wish to require, which would make it a lot harder to do typical searches.
Alternatively you could write up some kind of query parser that would take some desired alternate syntax (say, with the word 'OR' as an infix operator) to produce the proper MySQL boolean fulltext query.
For more info on the boolean fulltext seach, see: http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html
-- brion vibber (brion @ wikimedia.org)
_______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160
I started this discussion. I don't know where it is heading for now.. :) As a layman, I wonder why such a robust software like mediawiki doesn't provide some nice search features! It boasts of many other features.....
Another alternative is to use Postgres as a backend (which uses tsearch2), providing word stemming, AND and ORs, ranking, etc. There are also plenty of other options out there in addition to the extensions mentioned in this thread. See for example:
http://meta.wikimedia.org/wiki/FulltextSearchEngines
- -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200710161001 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
mediawiki-l@lists.wikimedia.org