Search engine.

List overview All Threads
Download

newer

older

[asad@cs.mcgill.ca: About starting...

List Items

Daniel Mayer

5 Dec 2003 5 Dec '03

4:22 a.m.

Brion wrote:

...

So why are we talking about this? Google isn't going to vanish from the web because we get the internal search back online.

Yeah but the link to the Google search form may disappear from Wikipedia once the in-house search is re-enabled. I find the Wikipedia Google search to be rather powerful for serious data mining where up to the minute article data is not needed. So I guess I'm advocating for a "Search {ProjectName} from Google" radio button right under the MediaWiki search form. The default of the toggle would be to use MediaWiki's own search functionality so long as the database server isn't bogged down.

Giving people the option of live database search and Googlesearch should reduce database server load as well (I always felt guilty using the Wikipedia search before but was still too lazy to visit my user page to click on my own link to the Wikipedia search form on Google. ).

-- Daniel Mayer (aka mav)

Show replies by date

Andrew Lih

5 Dec 5 Dec

5:06 a.m.

FYI, regarding search...

In case people are using Mozilla Firebird (which you really should!) you can make a really nice keyword to do WP search. Make a bookmark to:

http://www.google.com/search?hl=en http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=site%3A.wikipedi a.org+%s&btnG=Google+Search &ie=UTF-8&oe=UTF-8&q=site%3A.wikipedia.org+%s&btnG=Google+Search

And then go to "Manage bookmarks," do a "Properties" on that bookmark and enter something like "wps" in the Keyword box.

...

From now on, you can type "wps food" or "wps France" in the address bar

and it will do the search. The "food" or "France" will be substituted for the %s in the URL. So searching WP from a Firebird windows is as simple as -

CTRL-L

wps france

-Andrew

erik_moeller＠gmx.de

8:25 a.m.

Daniel-

...

Giving people the option of live database search and Googlesearch should reduce database server load as well (I always felt guilty using the Wikipedia search before

A properly implemented search should not make anyone feel guilty. If the search doesn't scale, we can't re-enable it, big ass server or not. In the current implementation, the search causes massive locking problems with high concurrent access, probably due to some unresolved deadlock. But the search index itself is fast - in the range of milliseconds. So once we resolve the locking issue, the additional load caused by the search should be negligible.

As for Google, I'm all for it if it doesn't overload the user interface. It would be nice to have an "Advanced Search" link anyway so that you don't have to run a normal search to get to the advanced search form. That advanced form could include the Google search box. But Google is not and cannot be a replacement for realtime searching.

Regards,

Erik

Toby Bartels

6:35 p.m.

Daniel Mayer wrote:

...

So I guess I'm advocating for a "Search {ProjectName} from Google" radio button right under the MediaWiki search form. The default of the toggle would be to use MediaWiki's own search functionality so long as the database server isn't bogged down.

This seems obviously correct to me. /Given/ that some people want to use Google, that is (I would prefer the in-house search in almost all cases myself).

-- Toby

Ray Saintonge

9:53 p.m.

Toby Bartels wrote:

...

Daniel Mayer wrote:

...
So I guess I'm advocating for a "Search {ProjectName} from Google" radio button right under the MediaWiki search form. The default of the toggle would be to use MediaWiki's own search functionality so long as the database server isn't bogged down.

This seems obviously correct to me. /Given/ that some people want to use Google, that is (I would prefer the in-house search in almost all cases myself).

That would be my preference too. Would it be a good idea to make the in-house search available to registered users? The Google search is likely adequate for people who have only a read-only interest in Wikipedia.

Stuardo Rodríguez (StR)

8 Dec 8 Dec

5:19 p.m.

New subject: Search engine. - HTDIG!!!

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On Thursday 04 December 2003 22:22, Daniel Mayer wrote:

...

Giving people the option of live database search and Googlesearch should reduce database server load as well (I always felt guilty using the Wikipedia search before but was still too lazy to visit my user page to click on my own link to the Wikipedia search form on Google. ).

Hi! , i'm new in the list... and i didn't have the time to read all mails, but... with this search stuf ....

Have anyone tried htdig? if you have a site, with all links to all articles, this search could work very fine.. is the one I use to a - laws - site we develope... we have like 1500 laws and from 5 to 15 pages each, and we use htdig for searching, it does not store the idexed db in any db - like mysql.. it has it's own db-like way to save the keywords....

we have it to reindex the db every day... just because we do not need it to be runed wevery hour... but... will. try to see if it does not consume to much CPU...

I think htdig could be the one that helps this issue.... just try it... you do not loose a thing just with trying..

- -- StR str@strgt.cjb.net - -[http://strgt.cjb.net%5D-%5BJust me 'n the world I created]- - -[http://www.gamersgt.com%5D- str@gamersgt.com

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) iD8DBQE/1LKk7NyBNYWmhiIRAm9cAKDTtNS1MDHVabYgmjipdEmVMAm7ngCfTM1E 8sxv11CILLv0wnFcGnC0tCI= =U3Ya -----END PGP SIGNATURE-----

Brion Vibber

9 Dec 9 Dec

3:33 a.m.

New subject: Search engine. - HTDIG!!!

On Dec 8, 2003, at 09:19, Stuardo Rodríguez (StR) wrote:

...

Hi! , i'm new in the list... and i didn't have the time to read all mails, but... with this search stuf ....

Have anyone tried htdig?

ht://Dig uses a web spider to do its indexing, which is less than ideal. It doesn't understand the structure of the wiki (don't index "edit this page"; keep articles and talk pages in distinct categories; doesn't understand which pages are redirects) and it has to spider the site to perform updates. Consider that we've got over 300,000 pages on the English Wikipedia alone (including talk pages, user pages, redirects, etc). It could probably be tweaked to grab updates off of Recentchanges and other improvements, but I'm not sure it's the best way to go.

JeLuF has experimented with a search engine based on Lucene (http://jakarta.apache.org/lucene/) a lone indexing/search engine which lets you feed it data and updates however you like. This could include keeping better track of wiki-specific data, and submitting the text in the form we like it when we want it. Pages could be indexed immediately on modification, or just the updated pages reindexed periodically. JeLuF, how did that look? Promising or not?

Also of course we can switch the mysql search back on. The database server will actually have all the CPU it wants now, so perhaps the mysterious hanging threads won't plague us anymore. If it sucks again, we can turn it back off until we figure it out or replace it.

--brion vibber (brion @ pobox.com)

Guillaume Blanchard

4:54 a.m.

New subject: Search engine. - HTDIG!!!

From: "Brion Vibber"

...

Also of course we can switch the mysql search back on. The database server will actually have all the CPU it wants now, so perhaps the mysterious hanging threads won't plague us anymore. If it sucks again, we can turn it back off until we figure it out or replace it.

Yes, please switch the mysql search back on until you found better solution.

Aoineko

Magnus Manske

8:15 a.m.

New subject: Search engine. - HTDIG!!!

Guillaume Blanchard wrote:

...

From: "Brion Vibber"

...
Also of course we can switch the mysql search back on. The database server will actually have all the CPU it wants now, so perhaps the mysterious hanging threads won't plague us anymore. If it sucks again, we can turn it back off until we figure it out or replace it.

Yes, please switch the mysql search back on until you found better solution.

If there's doubt about the resources it would take, why not enable it for the larger non-EN languages first, see what happens (for a few days). If the server doesn't choke, turn it on everywhere.

Magnus

Brion Vibber

8:24 a.m.

New subject: Search engine. - HTDIG!!!

On Dec 9, 2003, at 00:15, Magnus Manske wrote:

...

Guillaume Blanchard wrote:

...
Yes, please switch the mysql search back on until you found better solution.

If there's doubt about the resources it would take, why not enable it for the larger non-EN languages first, see what happens (for a few days). If the server doesn't choke, turn it on everywhere.

I've turned it on for everything but en, where the index didn't quite finish setting up because /tmp is on the tiny root partition and mysql thought this would be a great place for a few hundred megs of intermediate data. Adjusting...

-- brion vibber (brion @ pobox.com)

Andre Engels

9:40 a.m.

New subject: Search engine. - HTDIG!!!

On Mon, 8 Dec 2003, Brion Vibber wrote:

...

Also of course we can switch the mysql search back on. The database server will actually have all the CPU it wants now, so perhaps the mysterious hanging threads won't plague us anymore. If it sucks again, we can turn it back off until we figure it out or replace it.

I see it has been switched on now... Will the special pages be normally available soon too?

Andre Engels

7532

Age (days ago)

7536

Last active (days ago)

wikitech-l@lists.wikimedia.org

10 comments

10 participants

tags (0)

participants (10)

Andre Engels
Andrew Lih
Brion Vibber
Daniel Mayer
erik_moeller＠gmx.de
Guillaume Blanchard
Magnus Manske
Ray Saintonge
Stuardo Rodríguez (StR)
Toby Bartels