Hello,
I just uploaded the first version of a new extension called SphinxSearch ( http://www.mediawiki.org/wiki/Extension:SphinxSearch ). This extension uses the Sphinx Search engine as the backend for full text searches of wiki content. See the MW page for full description of the extension.
Since this is the first release of the extension, and since searching is such a fundamental requirement of a wiki-type webpage, I wanted to submit this extension for your scrutiny. I would like to get feedback on not only the extension's functionality, but also the clarity and completeness of the installation instructions.
At the same time, I have a question to the community. Is there a good way to change the default behavior of the Search box on the left of every page (rendered with monobook that is)? In other words, my extension adds a new Special Page. I would like the search box to use my special page instead of the default Special:Search. The reason for this is because I do not want to completely get rid of the build-in search engine ... at least not yet.
Many thanks in advance, paul.
You are aware of the LuceneSearch extension used by Wikipedia, right? Not that there's anything wrong with making a Sphinx plugin too, but your introductory preamble on the linked page and your apparent uncertainty as to how to proceed in replacing the default search suggest to me that you might not be aware of it to begin with.
Incidentally, there's no mileage in having access to the default search. It's slow to even maintain the indexes. Anyone using a better search solution would be best off dropping the relevant indexes/tables or otherwise disabling it for performance reasons if nothing else.
On 9/24/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Incidentally, there's no mileage in having access to the default search. It's slow to even maintain the indexes. Anyone using a better search solution would be best off dropping the relevant indexes/tables or otherwise disabling it for performance reasons if nothing else.
Out of curiosity, what is the reason we just don't use Google (or such)? The search capabilities in en.wiki are almost completely useless, IMHO, it can't even find articles with identical names if the capitalization is wrong, and it completely lacks anything like spell checking or reasonable relevance rankling. In order to find articles on en.wiki, I invariably open a second browser to search in, and this strikes me as rather sub-optimal. Is there anything I can do about this at "my end"?
Maury
On 9/25/07, Maury Markowitz maury.markowitz@gmail.com wrote:
On 9/24/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Incidentally, there's no mileage in having access to the default search. It's slow to even maintain the indexes. Anyone using a better search solution would be best off dropping the relevant indexes/tables or otherwise disabling it for performance reasons if nothing else.
Out of curiosity, what is the reason we just don't use Google (or such)? The search capabilities in en.wiki are almost completely useless, IMHO, it can't even find articles with identical names if the capitalization is wrong, and it completely lacks anything like spell checking or reasonable relevance rankling. In order to find articles on en.wiki, I invariably open a second browser to search in, and this strikes me as rather sub-optimal. Is there anything I can do about this at "my end"?
Freshly hacked: Add importScript('User:Magnus Manske/googlesearch.js'); to your monobook.js page. This will add a "google search" button in your sidebar.
Cheers, Magnus
On 9/25/07, Magnus Manske magnusmanske@googlemail.com wrote:
On 9/25/07, Maury Markowitz maury.markowitz@gmail.com wrote:
On 9/24/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Incidentally, there's no mileage in having access to the default search. It's slow to even maintain the indexes. Anyone using a better search solution would be best off dropping the relevant indexes/tables or otherwise disabling it for performance reasons if nothing else.
Out of curiosity, what is the reason we just don't use Google (or such)? The search capabilities in en.wiki are almost completely useless, IMHO, it can't even find articles with identical names if the capitalization is wrong, and it completely lacks anything like spell checking or reasonable relevance rankling. In order to find articles on en.wiki, I invariably open a second browser to search in, and this strikes me as rather sub-optimal. Is there anything I can do about this at "my end"?
Freshly hacked: Add importScript('User:Magnus Manske/googlesearch.js'); to your monobook.js page. This will add a "google search" button in your sidebar.
Cheers, Magnus
D'oh! There's already one at [[Wikipedia:WikiProject User scripts/Scripts/Google link]]
Oh well...
Magnus Manske <magnusmanske@...> writes:
Freshly hacked: Add importScript('User:Magnus Manske/googlesearch.js'); to your monobook.js page. This will add a "google search" button in your sidebar.
Cheers, Magnus
D'oh! There's already one at [[Wikipedia:WikiProject User scripts/Scripts/Google link]]
Oh well...
I'm not able to view your googlesearch.js nor the wikipedia one. However, if I understand the premise of this code, it basically adds a search form to submint the requested query to google.com which will then search only your domain.
This application only works for wiki's that can be seen from the outside world. In the case of my wiki, which is a corporate repository if information, it is behind a firewall, such an approach simply would not work. Hence the need for a local search engine like Sphinx.
Paul.
On 9/25/07, Paul Grinberg gri6507@yahoo.com wrote:
Magnus Manske <magnusmanske@...> writes:
Freshly hacked: Add importScript('User:Magnus Manske/googlesearch.js'); to your monobook.js page. This will add a "google search" button in your sidebar.
Cheers, Magnus
D'oh! There's already one at [[Wikipedia:WikiProject User scripts/Scripts/Google link]]
Oh well...
I'm not able to view your googlesearch.js nor the wikipedia one. However, if I understand the premise of this code, it basically adds a search form to submint the requested query to google.com which will then search only your domain.
This application only works for wiki's that can be seen from the outside world. In the case of my wiki, which is a corporate repository if information, it is behind a firewall, such an approach simply would not work. Hence the need for a local search engine like Sphinx.
The google search has nothing to do with the lucene search. lucene works locally, like sphinx.
Magnus
2007/9/25, Magnus Manske magnusmanske@googlemail.com:
Freshly hacked: Add importScript('User:Magnus Manske/googlesearch.js'); to your monobook.js page. This will add a "google search" button in your sidebar.
That's neat. Is it possible to do something similar when you're using the Classic skin, too?
On 9/25/07, Schneelocke schneelocke@gmail.com wrote:
2007/9/25, Magnus Manske magnusmanske@googlemail.com:
Freshly hacked: Add importScript('User:Magnus Manske/googlesearch.js'); to your monobook.js page. This will add a "google search" button in your sidebar.
That's neat. Is it possible to do something similar when you're using the Classic skin, too?
I just tried, but there's lots'o'stuff missing there (e.g., "id=xxx"), so it will be hard to do. Possible, though.
Magnus
On 9/25/07, Magnus Manske magnusmanske@googlemail.com wrote:
On 9/25/07, Schneelocke schneelocke@gmail.com wrote:
2007/9/25, Magnus Manske magnusmanske@googlemail.com:
Freshly hacked: Add importScript('User:Magnus Manske/googlesearch.js'); to your monobook.js page. This will add a "google search" button in your sidebar.
That's neat. Is it possible to do something similar when you're using the Classic skin, too?
I just tried, but there's lots'o'stuff missing there (e.g., "id=xxx"), so it will be hard to do. Possible, though.
OK, got it running :-)
Put importScript('User:Magnus Manske/googlesearch.js'); on your User:XXX/standard.js page
Cheers, Magnus
Magnus Manske wrote:
On 9/25/07, Magnus Manske wrote:
On 9/25/07, Schneelocke wrote:
2007/9/25, Magnus Manske wrote:
Freshly hacked: Add importScript('User:Magnus Manske/googlesearch.js'); to your monobook.js page. This will add a "google search" button in your sidebar.
That's neat. Is it possible to do something similar when you're using the Classic skin, too?
I just tried, but there's lots'o'stuff missing there (e.g., "id=xxx"), so it will be hard to do. Possible, though.
OK, got it running :-)
Put importScript('User:Magnus Manske/googlesearch.js'); on your User:XXX/standard.js page
Cheers, Magnus
You're fast :) Instead of using if (skin == "standard"), why not using if (!<result of document.getElementById(<missingid>)>) ?
2007/9/25, Magnus Manske magnusmanske@googlemail.com:
OK, got it running :-)
Put importScript('User:Magnus Manske/googlesearch.js'); on your User:XXX/standard.js page
Wonderful, thank you. :)
Schneelocke wrote:
2007/9/25, Magnus Manske:
Freshly hacked: Add importScript('User:Magnus Manske/googlesearch.js'); to your monobook.js page. This will add a "google search" button in your sidebar.
That's neat. Is it possible to do something similar when you're using the Classic skin, too?
You'd usually put it on User:yourname/classic.js instead. The problem here is that classic skin misses several ids.
Monobook search: <form action="/wiki/Special:Search" id="searchform"><div> <input id="searchInput" name="search" type="text" title="Search Wikipedia [f]" accesskey="f" value="" /> <input type='submit' name="go" class="searchButton" id="searchGoButton" value="Go" /> <input type='submit' name="fulltext" class="searchButton" id="mw-searchButton" value="Search" /> </div></form>
Classic search: <form name="search" class="inline" method="post" action="/wiki/Special:Search"> <input type="text" name="search" size="19" value="" /> <input type="submit" name="go" value="Go" /> <input type="submit" name="fulltext" value="Search" /> </form>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Maury Markowitz wrote:
Out of curiosity, what is the reason we just don't use Google (or such)?
Because being entirely dependent on the proprietary technology of a third-party company is unwise (lock-in dependency, privacy issues) and contrary to our organizational goals (maintain the site based on open technology).
- -- brion vibber (brion @ wikimedia.org)
On 25/09/2007, Maury Markowitz maury.markowitz@gmail.com wrote:
Out of curiosity, what is the reason we just don't use Google (or such)? The search capabilities in en.wiki are almost completely useless, IMHO, it can't even find articles with identical names if the capitalization is wrong, and it completely lacks anything like spell checking or reasonable relevance rankling. In order to find articles on en.wiki, I invariably open a second browser to search in, and this strikes me as rather sub-optimal. Is there anything I can do about this at "my end"?
I'm delighted to hear that all the excellent work that's been put into developing Lucene Search for Wikipedia as of late, as well as the ongoing work, is so easily characterised as pointless, or perhaps you haven't tried searching for anything lately - it's getting better.
We don't "just use Google" because we'd like our users to remain within the same site when searching it, to avoid confusion due to an inconsistent user experience, and the only means Google would provide us with to avoid that are either too expensive to justify, or proprietary, something we're desperately committed to avoiding.
Rob Church
Regarding Sphinx, I haven't used it, but I'm excited that there's a feasible non-java alternative to Lucene. Don't get me wrong, I like Lucene in principle - however many hosts don't permit Java apps (barring significant cost increase).
Ferret[1] may also work, though I don't know if anyone's attempted using it for MW search.
Is there any work being done on a native PHP alternative? Something which could conceivably plug into the ArticleSaveComplete hook and keep a text index up-to-date?
[1] http://ferret.davebalmain.com/trac/
-- Jim R. Wilson (jimbojw)
On 9/25/07, Rob Church robchur@gmail.com wrote:
On 25/09/2007, Maury Markowitz maury.markowitz@gmail.com wrote:
Out of curiosity, what is the reason we just don't use Google (or such)? The search capabilities in en.wiki are almost completely useless, IMHO, it can't even find articles with identical names if the capitalization is wrong, and it completely lacks anything like spell checking or reasonable relevance rankling. In order to find articles on en.wiki, I invariably open a second browser to search in, and this strikes me as rather sub-optimal. Is there anything I can do about this at "my end"?
I'm delighted to hear that all the excellent work that's been put into developing Lucene Search for Wikipedia as of late, as well as the ongoing work, is so easily characterised as pointless, or perhaps you haven't tried searching for anything lately - it's getting better.
We don't "just use Google" because we'd like our users to remain within the same site when searching it, to avoid confusion due to an inconsistent user experience, and the only means Google would provide us with to avoid that are either too expensive to justify, or proprietary, something we're desperately committed to avoiding.
Rob Church
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Jim Wilson wrote:
Is there any work being done on a native PHP alternative? Something which could conceivably plug into the ArticleSaveComplete hook and keep a text index up-to-date?
There's a PHP version of Lucene in the Zend Framework stuff, though I've never tried it. No idea how performance compares to Lucene/Java or to the MySQL fulltext search.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Paul Grinberg wrote:
At the same time, I have a question to the community. Is there a good way to change the default behavior of the Search box on the left of every page (rendered with monobook that is)? In other words, my extension adds a new Special Page. I would like the search box to use my special page instead of the default Special:Search. The reason for this is because I do not want to completely get rid of the build-in search engine ... at least not yet.
I'd recommend simply plugging in to the SearchEngine plugin. If you have to replace the entire Special:Search, then just do so as the old LuceneSearch plugin does.
There's no reason to keep the mysql search around as well.
- -- brion vibber (brion @ wikimedia.org)
Actually I have been experimenting with SphinxSearch and decided in the last week or so to use it on my sites, so thanks Paul!!
Joe Hagerty
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Paul Grinberg Sent: Monday, September 24, 2007 7:59 PM To: wikitech-l@lists.wikimedia.org Subject: [Wikitech-l] Request For Comments - Extension:SphinxSearch
Hello,
I just uploaded the first version of a new extension called SphinxSearch ( http://www.mediawiki.org/wiki/Extension:SphinxSearch ). This extension uses the Sphinx Search engine as the backend for full text searches of wiki content. See the MW page for full description of the extension.
Since this is the first release of the extension, and since searching is such a fundamental requirement of a wiki-type webpage, I wanted to submit this extension for your scrutiny. I would like to get feedback on not only the extension's functionality, but also the clarity and completeness of the installation instructions.
At the same time, I have a question to the community. Is there a good way to change the default behavior of the Search box on the left of every page (rendered with monobook that is)? In other words, my extension adds a new Special Page. I would like the search box to use my special page instead of the default Special:Search. The reason for this is because I do not want to completely get rid of the build-in search engine ... at least not yet.
Many thanks in advance, paul.
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 25/09/2007, Joseph Hagerty revjoe@revjoe.com wrote:
Actually I have been experimenting with SphinxSearch and decided in the last week or so to use it on my sites, so thanks Paul!!
It'd be interesting to see a search comparison for a Wikipedia database between MW Lucene and Sphinx (and any other search backends) on a Wikipedia data set.
Do we/can we collect search queries on en:wp? Say, 1 in 1000 searches, no details just the search text. That'll show what people are actually looking for.
- d.
On 9/26/07, David Gerard dgerard@gmail.com wrote:
Do we/can we collect search queries on en:wp? Say, 1 in 1000 searches, no details just the search text. That'll show what people are actually looking for.
It would be ridiculously easy to do. Proof of concept is at http://vps.epstone.net/~andrew/wiki/index.php/MediaWiki:Common.js
On 27/09/2007, Andrew Garrett andrew@epstone.net wrote:
On 9/26/07, David Gerard dgerard@gmail.com wrote:
Do we/can we collect search queries on en:wp? Say, 1 in 1000 searches, no details just the search text. That'll show what people are actually looking for.
It would be ridiculously easy to do. Proof of concept is at http://vps.epstone.net/~andrew/wiki/index.php/MediaWiki:Common.js
Well yes :-) I can see no privacy implication, but I might have missed something.
How many searches do we get a day?
Is this useful data to those working on MediaWiki/Wikipedia search?
(not to me, I must add - if I had concrete plans to apply it, I'd be pushing this hard)
- d.
David Gerard wrote:
On 27/09/2007, Andrew Garrett andrew@epstone.net wrote:
On 9/26/07, David Gerard dgerard@gmail.com wrote:
Do we/can we collect search queries on en:wp? Say, 1 in 1000 searches, no details just the search text. That'll show what people are actually looking for.
It would be ridiculously easy to do. Proof of concept is at http://vps.epstone.net/~andrew/wiki/index.php/MediaWiki:Common.js
Well yes :-) I can see no privacy implication, but I might have missed something.
How many searches do we get a day?
Is this useful data to those working on MediaWiki/Wikipedia search?
(not to me, I must add - if I had concrete plans to apply it, I'd be pushing this hard)
- d.
It would be useful data to those writing the encyclopedia; common search terms that are red links are likely to be good places to set up redirects or disambiguation pages.
-Gurch
wikitech-l@lists.wikimedia.org